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Career  Development  Award: 


Development  of  an  Integrated  Program  of  Health-Related  Quality  of  Life 
Research  for  the  National  Surgical  Adjuvant  Breast  and  Bowel  Project 

Richard  Day,  Ph.D. 

Department  of  Biostatistics 
University  of  Pittsburgh 

Fourth  Annual  Progress  Report 
September  1, 2000  to  August  31,  2001 


1.  Introduction 

This  Career  Development  Award  (CDA)  was  specifically  intended  to  support  Dr. 
Day  in  the  development  of  a  Health-Related  Quality  of  Life  Program  (HRQL)  for 
the  National  Surgical  Adjuvant  Breast  and  Bowel  Project  (NSABP).  Specific  aims 
proposed  for  the  CDA  included:  (a)  Design  and  implementation  of  new  HRQL 
components  for  planned  NSABP  treatment  and  prevention  trials;  (b)  testing  and 
implementation  of  data  collection  methods  to  be  used  in  treatment  and 
prevention  trials;  (c)  analysis  of  HRQL  data  collected  in  the  NSABP  prevention 
and  treatrhent  trials;  (d)  refinement  and  extension  of  HRQL  methods  to  analyze 
the  data  from  new  treatment  and  prevention  studies;  (e)  enhancement  of  minority 
participation  in  NSABP  trials.  Work  completed  during  the  third  12  months  of  Dr. 
Day’s  CDA  will  be  summarized  in  terms  of  these  aims. 
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2.  Body 

2.1  Design  and  implementation  of  new  HRQL  breast  cancer  components 
for  planned  NSABP  treatment  and  prevention  trials. 

Update  of  Health -Related  Quality  of  Life  (HRQL)  protocols  implemented 
in  as  part  of  Dr.  Day's  CDA: 

a.  Protocol  no.  B-33  -  A  Randomized,  Placebo  Controlled,  Double- 
Blinded  Trial  Evaluating  the  Effect  of  Exemestane  in  Stage  I  and  II 
Post-Menopausal  Breast  Cancer  Patients  Completing  at  least  Five 
Years  of  Tamoxifen  Therapy.  The  HRQL  component  of  this  protocol 
will  involve  300  patients  randomized  to  each  arm  of  the  trial.  It  is 
expected  that  examestane  will  have  significant  effects  on  the  patients' 
quality  of  life.  As  an  extremely  effective  aromatase  inhibitor,  the 
resulting  lack  of  estrogen  is  expected  to  be  associated  with  an  increase 
in  the  frequency  and  intensity  of  menopausal  symptoms.  It  is  expected 
that  any  increase  in  symptoms  will  occur  relatively  quickly  after  the 
initiation  of  examestane  and  will  remain  stable  as  long  as  the 
medication  is  taken.  The  instrument  selected  for  use  in  this  study  is  the 
Menopause  Specific  Quality  of  Life  Questionnaire.  Use  of  this 
instrument  will  permit  a  comparison  of  the  B-33  findings  to  a  study  of 
ietrozole  being  carried  out  by  the  NCI  of  Canada.  Current  Status 
(09/01/01):  This  trial  has  just  opened  this  month  and  3  patients 
have  so  far  been  recruited  into  the  HRQL  study. 

b.  Protocol  no.  B-32  -  A  Randomized,  Phase  III  Clinical  Trial  to 
Compare  Sentinel  Node  Resection  to  Conventional  Axillary 
Dissection  in  Clinically  Node  Negative  Breast  Cancer  Patients. 
Axillary  lymph  node  resection  is  generally  performed  on  women  with 
operable  breast  cancer  in  order  to  aid  in  the  determination  of  staging, 
need  for  adjuvant  therapy,  and  regional  control.  Although  rarely  life- 
threatening,  axillary  lymph  node  dissection  is  associated  with  significant 
morbidity.  Patients  often  experience  reduced  mobility  of  the  shoulder 
and  require  physical  therapy  to  regain  full  function  of  the  upper 
extremity.  Lymphedema  has  been  reported  by  30%  of  women  who  have 
had  conservative  breast  surgery  with  axillary  dissection.  In  B-32,  axillary 
dissection  is  compared  to  a  new  surgical  method,  sentinel  node 
resection.  Sentinel  node  resection  usually  requires  the  identification 
and  removal  of  a  single  lymph  node  (or  a  small  number  of  nodes)  from 
the  axilla.  Because  the  procedure  involves  much  less  extensive  surgery 
than  traditional  axillary  dissection,  we  expect  that  it  will  result  in  less 
morbidity  and  allow  for  more  rapid  return  to  normal  activity,  with  fewer 
long-term  sequale. 
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This  study  involved  the  development  and  testing  of  a  new,  self- 
administered  HRQL  questionnaire  (Physical  Functioning 
Questionnaire).  In  addition,  we  are  using  a  general  QOL  Rating  Scale 
(0-10),  anchored  by  death  and  perfect  health.  This  scale  has  been 
used  in  previous  NSABP  studies.  Approximately  325  patients  from 
each  study  arm,  stratified  by  type  of  operation  (lumpectomy/ 
mastectomy)  will  be  included  in  this  study. .  Current  Status 
(09/01/01):  The  HRQL  component  of  this  trial  is  now  open  and  19 
patients  have  so  far  completed  the  baseline  examination. 


c.  Protocol  no.  B-30  -  A  Three  Arm  Randomized  Trial  to  Compare 
Adjuvant  Adriamycin  and  Cyclophosphamide  Followed  by 
Taxotere  (AC-T);  Adriamycin  and  Taxotere  (AT);  and  Adriamycin, 
Taxotere  and  Cyclophosphamide  (ATC)  in  Breast  Cancer  Patients 
with  Positive  Axillary  Lymph  Notes.  Quality  of  life  measures  included 
in  B-30  questionnaire  are  the  Functional  Assessment  of  Cancer 
Therapy-Breast  (FACT-B),  a  treatment  specific  Symptom  Checklist 
(SCL),  the  SF-36  Vitality  Scale,  and  an  overall  HRQL  rating  scale.  Two 
additional  questionnaires  (Baseline  and  Follow-Up  Menstrual  History 
Questionnaires)  were  developed  and  tested  in  order  to  measure 
ovarian  damage  occurring  as  a  long-term  sequela  of  adjuvant 
chemotherapy. .  Current  Status  (09/01/01):  This  trial  was  closed  to 
recruitment  on  7/20/2001  and  includes  2107  patients  in  the  HRQL 
study. 

d.  Protocol  no.  P-2  -  Study  of  Tamoxifen  and  Raloxifene  (STAR).  This 
is  the  new  NSABP  prevention  study  following  on  the  positive  results  of 
the  P-1  (Breast  Cancer  Prevention  Trial)  Protocol.  A  new  HRQL 
component  was  developed  and  approved  by  the  National  Cancer 
Institute  and  integrated  into  the  study  protocol.  The  P-2  HRQL 
questionnaire  will  be  given  to  a  sub-sample  (n=3000)  of  the  complete 
STAR  cohort  (22,000  women);  the  NSABP  application  to  the  Cancer 
Prevention  and  Control  Protocol  Review  Committee  was  approved  to 
give  cancer  control  credits  to  CCOPS  participating  in  this  research. . 
Current  Status  (09/01/01):  The  HRQL  component  of  the  P-2  trial  has 
recruited  a  total  of  2118  participants  into  the  substudy. 

e.  Protocol  C-07  -  Trial  Comparing  5-Fluorourcil  (S-FU)  Plus 
Leucovorin(LV)  and  Oxaliplatin  with  5-FU  Plus  LV  for  the 
Treatment  of  Patients  with  Stages  II  and  III  Carcinoma  of  the  Colon. 
This  study  uses  the  1 1  item  FACT/GOG-NTX  scale  in  order  to  obtain 
the  patients'  subjective  assessment  of  neurotoxicity  attendant  upon  the 
administration  of  Oxaliplatin.  Current  Status  (09/01/01):  The  HRQL 
component  of  this  trial  was  closed  on  7/31/01  following  the 
recruitment  of  400  patients  to  the  substudy. 
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One  new  protocol  is  being  developed  for  inclusion  in  the  P-2  STAR  trial; 

f.  Protocol  STAR-Cog  -  Effects  of  Selective  Estrogen  Receptor 

Modulators  on  Cognitive  Aging:  A  Study  of  Tamoxifen,  Raloxifene 
and  Cognition.  This  is  a  direct  collaboration  with  the  National  Institute 
of  Aging  (NIA)  and  the  Woman's  Health  Initiative  Study  of  Cognitive 
Aging  (WHISCA).  The  proposed  study  examines  the  longitudinal 
cognitive  outcomes  in  1800  STAR  participants  using  the  same  battery 
of  neuropsychological  instruments  utilized  in  WHISCA.  This  protocol  is 
currently  awaiting  the  approval  of  funding  from  NIA. 

2.2  Testing  and  implementation  of  data  collection  methods  to  be  used  in 
treatment  and  prevention  trials 

Operational  Procedures  to  Reduce  Missing  and  Delinquent  HRQL  Data 
-  Over  the  past  12  months,  procedures  intended  to  reduce  missing  and 
delinquent  data  which  were  implemented  during  months  24-36  (i.e.,  1999- 
2000)  have  been  continued,  intensified  and  refined.  Specific  elements  of 
this  strategy  include:  (1)  The  use  of  missing  data  forms;  (2)  the  inclusion  of 
HRQL  questionnaires  in  delinquency  assessments;  (3)  periodic  HRQL 
training  sessions  at  national  meetings;  and,  (4)  the  routine  notification  of 
study  coordinators  of  scheduled  HRQL  examinations.  Missing  and 
delinquent  data  continues  to  be  a  difficult  issue,  however,  and  overall 
compliance  rates  for  most  HRQL  studies  remain  at  approximately  70%. 


2.3  Analysis  of  HRQL  Data  Collected  in  the  NSABP  Prevention  and 
Treatment  Trials: 

a.  Peer  Reviewed  Papers: 

Day  R,  Ganz  PA,  Costantino  JC.  Tamoxifen  and  Depression:  More 
Evidence  from  the  NSABP's  Breast  Cancer  Prevention  (P-1) 
Randomized  Study.  JNCI,  93,  2001  (in  press,  7  Nov.  2001  issue). 
(Appendix  1) 

Day  R,  Quality  of  life  and  tamoxifen  in  breast  cancer:  a  summary  of  the 
findings  from  the  NSABP  P-1  study.  Annals  of  the  New  York  Academy 
of  Sciences  (in  press).  (Appendix  2) 

Land  S,  Wieand  S,  Day  R,  Have  T,  Costantino  J,  Lang  W,  Ganz  P. 
Methodological  issues  in  the  analysis  of  quality  of  life  data  in  clinical 
trials:  illustrations  from  the  NSABP  Breast  Cancer  Prevention  Program. 
In:  M.  Mesbah,  B.  Cole,  M  Lee  (eds.).  Statistical  Design,  Measurement 
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and  Analysis  of  Health  Related  Quality  of  Life.  Klewler  Academic 
Publishers  (in  press).  (Appendix  3) 

Kiebert  G,  Wait  S,  Bernhard  J,  Bezjak  A,  Celia  D,  Day  R,  Houghton  J, 
Moinpiour  C,  Scott  C,  Stephens  C.  Practice  and  policy  of  measuring 
quality  of  life  and  health  economics  in  cancer  clinical  trials:  a  survey 
among  cooperative  groups.  Quality  of  Life  Research  2000;  9(1 0):  1073- 
80.  (Appendix  4) 

b.  Submitted  papers: 

Day  R,  Celia  D,  Ganz  PA,  Daly  MB,  Rowland  J,  Wolter  J.  Determining  the 
Feasibility  and  Usefulness  of  Microelectronic  Adherence  Monitoring 
Compared  to  Pill  Counts  and  Self-Reports  in  a  Large,  Multicenter 
Chemoprevention  Trial.  Submitted  to  Controlled  Clinical  Trials  (in  revision). 

c.  Papers  in  progress: 

With  Dr.  Patricia  Ganz  and  Dr.  David  Celia  (mentors):  Factor  analysis 
of  the  P-1  43-item  Symptom  Checklist  data.  Initial  analyses  suggest 
that  most  of  the  variance  in  baseline  SCL  scores  can  be  explained  by  a 
small  number  (7  or  8)  independent  latent  variables.  The  goal  of  this 
paper  is  to  simplify  the  SCL  for  future  prevention  studies  and  assess  the 
stability  of  these  initial  latent  factors  on  follow-up  in  the  tamoxifen  and 
placebo  arms. 

2.4  Refinement  and  extension  of  HRQL  methods  to  analyze  the  data  from 
new  treatment  and  prevention  studies 

Day,  Ganz,  and  Costantino  (2001,  Appendix  1)  developed  and  tested 
methods  for  assessing  depression  risk  and  missing  data  in  the  P-1  data. 

Land,  Wieand,  Day  et  al.  (2001,  Appendix  2)  extended  and  refined  missing 
data  techniques  in  the  previous  paper. 

2.5  Enhancement  of  minority  participation  in  NSABP  trials  and  the 
implementation  of  measures  focusing  on  HRQL-related  issues  in 
women  of  color 

This  objective  is  complete. 
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Reportable  Outcomes 


a.  Peer  Reviewed  Papers: 

Day  R,  Ganz  PA,  Costantino  JC.  Tamoxifen  and  Depression:  More 
Evidence  from  the  NSABP's  Breast  Cancer  Prevention  (P-1)  Randomized 
Study.  JNCI,  93,  2001  (in  press,  7  Nov.  2001  issue).  (Appendix  1) 

Day  R,  Quality  of  life  and  tamoxifen  in  breast  cancer:  a  summary  of  the 
findings  from  the  NSABP  P-1  study.  Annals  of  the  New  York  Academy  of 
Sciences  (in  press).  (Appendix  2) 

Land  S,  Wieand  S,  Day  R,  Have  T,  Costantino  J,  Lang  W,  Ganz  P. 
Methodological  issues  in  the  analysis  of  quality  of  life  data  in  clinical  trials: 
illustrations  from  the  NSABP  Breast  Cancer  Prevention  Program.  In:  M. 

Mesbah,  B.  Cole,  M  Lee  (eds.).  Statistical  Design,  Measurement  and 
Analysis  of  Health  Related  Quality  of  Life.  Klewler  Academic  Publishers  (in 
press).  (Appendix  3) 

Kiebert  G,  Wait  S,  Bernhard  J,  Bezjak  A,  Celia  D,  Day  R,  Houghton  J, 

Moinpiour  C,  Scott  C,  Stephens  C.  Practice  and  policy  of  measuring  quality 
of  life  and  health  economics  in  cancer  clinical  trials:  a  survey  among 
cooperative  groups.  Quality  of  Life  Research  2000;  9(10):  1073-80. 

(Appendix  4) 

b.  Submitted  papers: 

Day  R,  Celia  D,  Ganz  PA,  Daly  MB,  Rowland  J,  Wolter  J.  Determining  the 
Feasibility  and  Usefulness  of  Microelectronic  Adherence  Monitoring  Compared  to 
Pill  Counts  and  Self-Reports  in  a  Large,  Multicenter  Chemoprevention  Trial. 
Submitted  to  Controlled  Clinical  Trials  (in  revision). 
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5.  Conclusions 


There  are  now  a  total  of  3  ongoing  NSABP  HRQL  studies  in  the  process  of  data 
collection  (P-2,  B-32,  B-33).  Two  studies  which  were  undenway  at  the  last 
reporting  date  have  now  been  successfully  closed  to  recruitment  (B30,  C-07). 
For  the  past  12  months  of  the  CDA,  Dr.  Day  has  had  reduced  responsibility  for 
day-to-day  monitoring  of  ongoing  protocols  and  has  served  as  a  consultant  for 
study  development.  This  has  permitted  him  to  devote  an  increased  amount  of 
time  in  months  37-48  of  the  grant  to  the  completion  of  manuscripts  and  reports. 
A  no  cost  12  month  extension  of  the  award  has  been  approved  enabling  Dr.  Day 
to  prepare  additional  analyses  and  publications  on  NSABP  quality-of-life  data. 
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Tamoxifen  and  Depression!  More  Evidence  From  the 
National  Surgical  Adjuvant  Breast  and  Bowel  Project’s 
Breast  Cancer  Prevention  (P-1)  Randomized  Study 


Richard  Day,  Patricia  A,  Ganz^  Joseph  P.  Costantino 


Background:  Concerns  have  been  raised  that  tamoxifen  may 
be  associated  ivlth  depression*  To  investigate  this  question, 
we  examined  the  psychological  effects  of  tamoxifen  treat¬ 
ment  for  breast  cancer  prevention  on  women  at  different 
levels  of  risk  for  clinical  depression  who  were  enrolled  in  the 
National  Surgical  Adjuvant  Breast  and  Bowel  project’s 
Breast  Cancer  Prevention  (P-X)  Study,  Methods:  A  total  of 
11 064  women  were  randomly  assigned  to  receive  for  5  years 
daily  doses  of  20  mg  of  tamoxifen  or  placebo  in  the  P-1  study, 
a  muUicenter,  doublewfolind,  placebo-controlled  chemopre- 
vention  trial*  Each  woman  was  prospectively  assessed  for 
depression  risk  on  the  basis  of  medical  history  items  col¬ 
lected  at  the  baseline  examination  and  placed  in  a  high-, 
medium-,  or  iow-riek  group.  Every  6  months,  for  a  total  of 
36  months,  the  participants  were  assessed  for  depressive 
symptoms  by  completing  the  Center  for  Epidemiological 
Studies— Depression  (CES-D)  questionnaire*  Scores  of  16  or 
higher  were  Indicative  of  an  episode  of  affective  distress* 
Differences  between  the  risk  groups  and  treatment  arms 
were  analysed  by  logistic  regression.  All  statistical  tests  were 
two-sided.  Results:  Women  in  the  higher  risk  depression 
groups  were  more  likely  to  score  10  or  higher  on  the  CES-D 
(percent  follow-up  examinations  with  a  score  of  5*1 6;  high- 
risk  group  -•  3S*7%,  with  95%  confidence  interval  (Clj  * 
32,5%  to  38*9%;  medium-risk  group  =  X9,2%,  with  95%  Cl 
18*1%  to  20*3%;  and  low-risk  group  =  8*7%,  with  95%  Cl 
=  83  to  9.x  %)  and  to  have  these  scores  more  frequently  and 
for  longer  periods  than  women  in  the  lower  risk  groups* 
Within  each  depression  risk  group,  there  was  no  difference 
in  the  proportion  of  women  scoring  16  or  higher  by  treat¬ 
ment  assignment  (tamoxifen  versus  placebo)  (odds  ratio  a 
0,98;  95%  Cl  =  0,93  to  1,02)*  A  post-hoc  analysts  Indicated 
that  the  lack  of  a  tamoxifen  effect  was  not  a  result  of  differ¬ 
ential  missing  data.  Conclusions:  Physicians  need  not  be 
overly  concerned  that  treatment  with  tamoxifen  will  increase 
the  risk  for  or  exacerbate  existing  depression  In  women*  Nev¬ 
ertheless,  physicians  should  continue  to  screen  for  and  treat 
or  refer  potential  cases  of  depression  encountered  In  routine 
clinical  practice.  [J  Natl  Cancer  Inst  2001;93:000>-00] 


Concern  regarding  an  association  between  clinical  depression 
and  tamoxifen,  when  used  as  an  adjuvant  treatment  or  preven¬ 
tative  agent  for  breast  cancer,  has  been  voiced  by  a  number  of 
investigators  ( 1-5)  and  continues  to  be  discussed  in  regulatoiy. 
agencies,  such  as  the  U.S,  Food  and  Drug  Administration.  Fur¬ 
thermore,  the  Physician's  Desk  Reference  {6)  lists  **depression** 
as  an  infrequent  adverse  reaction  to  tamoxifen.  Although  previ¬ 
ous  studies  { 7-5)  used  breast  cancer  patients  to  address  tamoxi¬ 


fen  use  and  depression,  the  studies  had  a  number  of  weaknesses, 
including  the  lack  of  a  clear  definition  of  depression  and  a  failure 
to  control  for  the  potential  confounding  effects  of  illness  diag¬ 
nosis,  the  side  effects  of  chemotherapy  (e.g.,  premature  meno¬ 
pause),  or  normal  aging.  Previously,  two  double-blind,  placebo- 
controlled  studies  of  the  effects  of  tamoxifen  in  postmenopausal 
women  (7,5)  found  no  association  of  tamoxifen  witli  depression. 
We  believe  that  some  of  the  concern  over  the  relationship  be¬ 
tween  tamoxifen  and  depression  arises  from  the  idea  that,  be¬ 
cause  hormone  replacement  therapy  has  positive  effects  on 
mood  and  tamoxifen  has  antiestrogenic  activity  (9-11),  tamoxi^ 
fen,  therefore,  has  negative  effects  on  mood. 

The  completion  of  the  Breast  Cancer  Prevention  (P-l)  Study 
of  the  National  Surgical  Adjuvant  Breast  and  Bowel  Project 
(NSABP)  provides  an  opportunity  to  investigate  the  association 
between  tamoxifen  and  depression  in  greater  detail.  The  P-l 
study  was  a  mulUcemer,  double-blind,  placebo-controlled  che- 
moprevention  trial.  The  primary  objective  of  the  study  was  to 
evaluate  whether  3  years  of  tamoxifen  therapy  would  reduce  the 
incidence  of  invasive  breast  cancer  in  women  at  an  increased 
risk  for  the  disease.  Tlie  secondaiy  objectives  of  the  study  in¬ 
cluded  the  assessment  of  the  incidence  of  ischemic  heart  disease, 
bone  fractures,  and  other  negative  health  events,  such  as  depres¬ 
sion,  that  might  be  associated  with  tamoxifen  Oierapy.  Eligible 
panicipants  were  randomly  assigned  to  receive  20  mg  daily  of 
tamoxifen  or  a  placebo  for  5  years.  Detailed  reports  on  the 
rationale,  planning,  design,  and  clinical  outcome  of  the  P-l  study 
are  available  elsewhere  (12-16). 

In  our  initial  publication  on  the  health-related  quality  of  life 
(HRQL)  (16)  of  all  subjects  in  the  P-l  study,  we  did  not  find  a 
difference  between  the  treatment  groups  (tamoxifen  versus  pla¬ 
cebo)  on  the  Center  for  Epidemiological  Studies'— Depression 
(CES-D)  Scale  (17)  or  the  SF-36  Mental  Health  Scale  (18).  It  is 
known,  however,  that  vulnerability  to  clinically  identifiable 
forms  of  depression  is  not  uniformly  distributed  in  the  general 
female  population  but,  instead,  clusters  in  high-risk  groups  of 
women  (19).  This  vulnerability  to  depression  may  be  inherited, 
suggesting  a  genetic  or  familial  origin,  or  it  may  be  related  to 
certain  psychological  predispositions,  such  as  a  low  self-esteem, 
a  poor  resistance  to  stress,  or  a  pessimistic  view  of  the  world.  We 
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wc5re  concerned  that  the  potential  negative  effects  of  tamoxifen 
for  women  at  high  risk  for  depression  may  have  been  masked  in 

our  previous  analysis  { 16)  because  of  the  simultaneous  inclusion 

of  a  larger  group  of  less  vulnerable  (i.e.,  low-risk)  participants. 

In  this  study,  we  investigated  the  effects  of  tamoxifen  on 
women  at  different  levels  of  risk  for  depression.  Specifically,  we 
were  interested  in  whether  tamoxifen  treatment  was  associated 
with  the  onset  or  prolongs  the  length  of  existing  episodes  of 
clinically  diagnosable  depression  in  women  at  high  risk  for  de- 
pres.sian. 

Patients  and  Methods 

Participant  Cohort  and  HRQL  Data 

This  article  covers  the  baseline  and  first  36  months  of  follow-up  data  (coU 
lectcd  ut  6-mon(h  intervals)  on  the  same  1 1 064  women  used  in  the  initial  HRQL 
report  {\6)  from  the  P-1  study.  The  P- 1  psirticjpqnis  ranged  in  age  from  35  years 
to  70  yeurs  (mean  *  standard  deviation  =  53,8  ±  0.2  years),  were  predominantly 
while  (05.6%),  were  well  educated  (5?  some  college  =  64.0%),  and  were  cur¬ 
rently  employed  (full-  or  part-time  =  64.7%)  in  u  professjonal  or  technical  field 
(07,9%).  A  detailed  descrtpiion  of  this  cohort  of  participants  and  the  P-l  HRQL 
Instnimcnts  was  reported  previously  (/4,y6),  All  investigations  conducied  in  the 
p-t  fc-iwdy  were  approved  by  review  bo^rdu  m  tuioh  institution  and  were  in  accord 
with  an  asHuraticc  filed  with  and  approved  by  the  U.S.  Department  of  Health  and 
Human  55crvicc!t  (/2).  All  of  the  participants  provided  written  informed  consent. 

Defining  Depres^iion 

Depressive  disorders,  defined  by  the  curren!  psychiatric  norncnclBium  in 
the  Dhgno)(tic  and  Statistical  Manual  of  Mental  Dixordem:  DSM-  iV  (DSM-IV) 
(20),  come  In  a  variety  of  fonns  tlwt  differ  on  the  basis  of  the  number,  .severity, 
and  persistence  of  symptoms.  The  majority  of  clinicalfy  diugnusablc  episodes  of 
depression  involve  one  of  three  disorders— major  depression,  dysibymia,  or 
bipolar  disorder  (19).  Major  depression  involves  an  illness  episode  lasting  at 
least  2  wceK.s  that  includes  mood  disturbance  (dysphoria)  and  at  least  four  of  the 
following  symptoms:  sleep  disturbance,  chance  in  psychomotor  activity,  lo.ss  of 
ability  to  experience  pleasure  and  interest,  farlguc,  feelings  of  worthlcKsness  or 
guilt,  difficulty  in  concentrating,  and  a  preoccupation  with  death  or  a  wish  to  die. 
These  symptoms  must  be  associated  with  a  clear  impairment  in  social  function¬ 
ing.  Dysthymic  disorder  or  dy-sthymia  is  a  chronic  illness  lasting  at  )ea.si  2  years, 
Dysthymia  docs  not  show  the  seme  levels  of  social  impairment  found  In  major 
depression,  but  it  docs  involve  mood  disturbance  (dysphoria)  and  a  loss  of  the 
ability  to  experience  pleasure  and  interest  in  wsubI  activities,  together  witli  some 
of  the  other  symptoms  used  to  define  major  depression.  Individuals  diagnosed 
with  dysthymia  often  experience  episodes  of  major  depression  during  their  life¬ 
time,  DSM-IV  distinguishes  bipolar  disorders  from  depressive  disorders.  Bipolar 
disorders  have  dramatic  clinical  manifestations  that  involve  one  or  more  epi¬ 
sodes  of  hypomania  during  an  individuars  lifetime  alternating  with  illness  epi¬ 
sodes  that  fit  the  criieria  for  major  depression  disorder. 

DeprcKEinn  was  previously  defined  by  thfi  Roscarch  Dingnostio  Crheriu  (RDC) 
(2J).  a  nonciinical  forerunner  of  the  cuircnt  OSM-IV  criteria,  The  RDC  used 
similar  criteria  as  the  D$M-1V  to  define  “major  depression"  but,  unlike  the 
DSM-IV,  also  included  criteria  to  define  “minor  depression"'  (nonpsychotic  epi¬ 
sodes  of  illness  characterized  by  a  prominent  Hnd  .sustained  dysphoria  but  lacking 
all  of  the  symptomatic  features  of  mtyor  depression).  Although  important  his¬ 
torically,  the  RDC  has  been  .superseded  by  the  DSM-IV, 

One  of  the  problems  associated  with  the  definition  of  depression  is  that,  in 
addition  to  these  diagnosable  dinicnl  entities,  there  arc  multiple  sources  of 
affective  distress  (hat  may  result  in  short-iKrm  or  Kelf-limUing  oxprocfiiofis  of 
depressive  symptoms  without  meeting  the  DSM-IV  criteria  outlined  above.  The 
best  available  data  on  rates  of  clinically  diagnosable  depressive  disorders  in  the 
U,S,  general  population  come  from  the  National  Institute  of  Mental  Health's 
epidemiological  Catchment  Area  (RCA)  study  (/9j.  ECA  study  investigators 
found  that,  even  thougn  clinically  diagnosable  dcpres.slvc  dl.sor<lC«  are  relatively 
rare,  usually  affecting  only  of  the  general  female  population  during  any 

12-month  period,  (he  reporting  of  daprassive  cympiomv  tv  reuuonahly  froquant, 
with  35.7%  of  the  women  in  the  ECA  study  ()9)  reporting  having  experienced 


a  period  of  dy.Hphoria  (feeling  and  or  blue)  lasting  at  !eq.st  2  weeks.  These 
expressions  of  affective  distress,  which  fail  to  meet  the  clinical  criteria  for  major 
depression,  dysthymia,  or  bipolar  illness,  are  often  ussocisted  with  nccurronces 
such  as  uncomplicated  grief,  medical  illness  and  other  life  events,  or  chronic 
difficulties  (22).  Depressive  .symptoms  may  also  occur  secondary  to  other  psy¬ 
chiatric  illnesses  (i.e.,  anxiety  disorders  or  phobias),  chronic  medical  conditions, 
or  substance  abuse. 

Monitoring  Depressive  Symptoms  in  the  P-l  Study 

The  primary  Instrument  used  to  monitor  depressive  symptoms  in  the  P-l  .study 
was  the  CES-D  (i7j.  This  .self-administered  questionnaire  was  designed  to  be  a 
brief,  first-stage  .screen  rather  than  a  clinical  diagnostic  instrumem.  The  CEJ5-D 
is  composed  of  20  items,  each  of  which  is  scored  on  a  scale  of  0-3.  Higher  scores 
reflect  increased  expruTision  of  nffective  distress,  and  u  score  of  )  6  or  higher  1$ 
most  often  u.scd  us  the  cutoff  point  for  likely  oqses  of  clinical  depression 
(17.23.24). 

Two  problems  are  associated  with  the  use  of  the  CES-D  alone  to  screen  for 
clinically  diagnosable  episodes  of  depression.  First,  quo.stion.s  on  the  CES*D 
inquire  only  about  the  past  7  days,  collecting  little  information  on  the  length  of 
time  lhat  a  symptom  has  been  preaenl.  Second,  the  CES-D  collects  information 
only  on  symptoms  and  not  the  degree  of  social  impairment  experienced  by  the 
respondent.  Consequently,  scores  above  the  CES-D  clinical  cutoff  point  of  16 

tend  to  include  ^  proportion  of  didireetied  tndividuob — perhapa  up¬ 

wards  of  one  half  or  more— who  do  not  meet  the  clinical  criteria  for  major 
depression,  dysthymia.  or  bipolar  illncs.*;  124,23). 

Kslimating  Depression  Risk  in  P-1  Study  Participants 

1'hc  eligibility  criteria  for  the  F-1  study  permitted,  at  the  discretion  of  the  local 
Kite  invcsdgAior,  the  hiclaeion  of  wom<»rt  with  evidence  of  clinieol  depreasion. 
Twenty  to  22%  of  the  particip0ni»  scored  16  or  higher  on  the  CES-D  at  least  once 
during  any  l2-momh  period  of  the  E-l  study.  Tills  pcrcentace  exceeds  the 
expected  general  population  rates  lS%-6%  (J9))  of  clinically  diagnosable  de¬ 
pressive  disorders  over  a  l2-monlh  period  by  3.5-4.0  times,  indicating  that  it  Is 
necessary  to  distinguish  between  clinically  diagnosable  episodes  of  depres.sion 
and  depressive  symptoms  that  arc  .secondary  to  other  types  of  physical  und 
p.syGhiatric  illnesses  or  a  consequence  of  social  conditions  that  produce  shoit- 
lenti,  self-limiting  expressions  of  affective  distress.  The  preferred  means  to  make 
such  a  disilnciion  would  be  a  standardized  psychiarric  inu^rview,  xuch  th« 
Schedule  for  Affective  Disorders  und  Schizophrenia — Lifetime  Version  (26)  or 
the  Diagnostic  Interview  Schedule  (J9).  However,  in  (he  absence  of  such  an 
interview,  the  best  single  indicator  of  risk  for  a  future  episode  of  major  depres¬ 
sion,  dysthymia.  or  bipolar  disorder  in  the  P-l  study  data  is  a  medical  history  of 
treaimeni  for  these  disorders  f27-J(7;. 

The  ECA  .study  (19)  found  that  the  mean  age  at  onset  for  major  depressive 
disorders  in  the  general  population  was  27  years,  with  approximately  89%  of  oil 
first  depressive  episodes  occuiring  before  age  35  years,  which  was  the  lower  age 
limit  of  the  paiticipartis  in  the  P-l  study.  Medical  history  information,  collected 
on  a  oncdime-only  ba>;iR  as  u  part  of  the  baseline  entry  and  eligibility  a.ssessmenl 
of  all  p-l  study  participants,  included  three  self-reported  items  regarding  de¬ 
pression:  I)  a  medical  history  of  depression,  2)  current  or  previous  prescriptions 
for  anlidepressuni  medications,  and  3)  extended  periods  (:^12  monUis)  of  dys¬ 
phoric  mood  (i.c.,  “dcpresHcd  or  Mad  most  days”).  If  a  pnnicipani  gave  a  positive 
answer  to  the  medical  history  or  the  medication  question,  the  interviewer  ob¬ 
tained  dates  of  treatment,  physicians'  names,  specific  modalities  of  treatment, 
and  dale  of  last  medication  dose  to  assess  the  consistency  and  appropriateness  of 
the  Information  provided. 

'Tltese  three  medical  history  items  were  used  in  the  current  study  to  prospec¬ 
tively  estimate  each  participant's  risk  of  experiencing  a  clinically  diagnosable 
episode  of  depression.  A  simple  ihrcc-lcvc!  risk  score  was  determined  for  each 
P-l  study  participant,  depending  on  whether  they  endorsed  0  (low  risk),  I  nr  2 
(modiani  risk),  or  3  (high  ritik)  of  iho  mcdicol  histoi'y  ilcma  regarding  dcprcaslon 
in  the  Entry/Eligibilhy  Form.  We  hypothesized  that  women  with  higher  scores 
on  Chi.s  simple  depression  risk  scale  would  experience  more  severe  and  persistent 
episodes  of  affective  distress  and  would  be  more  likely  to  receive  a  cHnicq) 
diagnosis  of  depression.  Moreover,  if  tamoxifen  was  associated  with  the  onset 
and/or  prolonged  tlie  length  of  depressive  episodes  in  the  high-risk  (i.c.,  more 
Vulnerable)  group,  it  should  be  apparent  from  longitudinal  differences  in  the 
proporiion  of  P-l  Mp4y  pnflicipAnts  in  (he  LrcaLmcrn  greupM  (tamoxifen  versus 
placebo)  who  scored  16  or  higher  on  the  CES-D. 
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"  Statistical  Analysis 


CES-D  scores  were  analyzed  as  above  or  below  ihe  clinical  cutoff  of  16  or 
higher.  Binary  logistic  regression  was  the  primary  method  of  siatjstjcal  analysis 
used  in  this  study.  Estimated  odds  ratios  (ORs),  confidence  intervals  (CIs),  and 
P  values  are  provided  for  all  inferential  analyses.  Cox  regression  analysis  was 
used  to  investigate  the  effects  of  treuiment  and  depression  risk  on  the  time  to  the 
fimt  CES-P  with  a  xcorc  of  16  or  higher,  and  Kaptan-Mclcr  curves  are  provided 
for  ihcKe  data.  When  the  CES»D  data  were  bantlM  as  n  continuous  variable, 
nonparameilc  equivalents  to  a  one-way  analysis  of  variance  (ix,,  Kruskal-Wallis 
test)  were  used  because  it  is  unusual  for  CES-D  scores  to  be  normally  distrib¬ 
uted.  Graphic  presentations  include  95%  CIs  on  oh.scrvcd  proportions  to  provide 
the  reader  wiih  visual  criteria  for  the  magnitude  of  potential  variation.  Reported 
P  values  arc  all  two-sided  and  have  not  been  adjusted  for  multiple  siuijstical 
comparisons,  instead,  we  have  chosen  (n  focus  on  consistent  puttems  of  findings 
rather  than  on  individual  siaiiKtical  tests  in  forming  our  conclusions.  Wc  also 


Mvoldad  !h«  UM  of  auitisiicftl  mBihodw  for  impuiMion  of  mining  d«a  points  in  the 
primary  data  because  ihe  data  did  not  meet  the  strong  assumptions  that  normally 
underlie  such  procedures  (e.g..  MCAR  li.e.,  Missing  Completely  at  Random]/ 
MaR  fi.c„  Missing  at  Random)).  Analyse.^  were  carried  out  with  the  use  of 
Mtnimb  (Version  I3j^latc  College,  PA)  and  Egret  (Version  l.O;  Cytel  Corp., 
Cambridge,  MA).  "  . . 


Results 


Depression  Risk 


To  detennine  whether  there  was  an  association  between  de¬ 
pression  and  tamoxifen  treatment  in  participants  of  the  P-l 
study,  we  first  calculated  the  depression  risk  score  from  the 
frequency  of  responses  to  each  one  of  the  medical  history  items 
(Table  1).  The  three  components  of  this  score  were  only  mod¬ 
erately  iniercorrelaied,  The  highest  correlation  occurred  between 
a  history  of  illness  and  antidepressant  medications  (r  =  .564; 
i'<.00l),  followed  by  history  of  illness  and  persistent  dysphoric 
mood  (r  =  ,369;  f*<00l)  and  medications  and  dysphoric  mood 
(r  =  .269;  P<,00l),  Overall  depression  risk,  measured  by  the 
data  from  this  study,  was  not  statistically  significantly  related  to 
the  participants’  risk  of  breast  cancer,  as  measured  by  the  Gail 
risk  model  (I2JI). 

The  construct  validity  of  this  depression  risk  score  was  evalu¬ 
ated,  in  part,  with  the  use  of  the  social  and  demographic  factors 
associated  with  clinically  diagnosable  depressive  disorders  in 
the  ECA  study  (19).  Table  2  shows  the  distribution  of  the  P-I 
study  participants  according  to  the  three-level  depression  risk 


scale  on  seven  demographic  variables,  which  approximate  those 
associated  with  clinically  diagnosable  depression  in  the  ECA 
study  (f9).  All  of  these  variables,  except  education,  showed  a 
statistically  significant  dose-response  relationship  to  the  depres¬ 
sion  risk  scores  in  terms  of  the  direction  and  intensity  of  the 
association, 

CES-D  Data 


Fig,  1,  a,  shows  the  proportion  of  the  participants  in  each 
depression  risk  group  who  scored  above  the  clinical  cutoff  of  16 
or  higher  on  the  CES-D  Scale  at  baseline  and  at  each  of  the 
follow-up  examinations,  A  consistent,  positive  dose-response 
relationship  was  seen  between  depression  risk,  as  determined  on 
the  basis  of  the  medical  history  items,  and  the  proportion  of 
participants  scoring  16  or  higher  on  the  CES-D  Scale  at  each 
scheduled  examination.  For  each  depression  risk  group.  Table  3 
shows  the  mean  proportion  of  follow-up  examinations  with 
scores  of  16  or  higher  and  the  distribution  of  the  maximum  and 
the  overall  scores  on  CES-D  examinations  above  the  clinical 
outpoint.  A  positive  dose-response  relationship  was  also  ob¬ 
served  between  depression  risk  group  and  proportion  of  rc^ipon- 
dents  who  scored  16  or  higher  on  sequential  CES-D  examina» 
lions.  In  the  high-risk  depression  group,  for  example,  21,2%  of 
the  respondents  scored  16  or  higher  on  three  or  more  sequential 
CES-D  examinations,  compared  with  9.7%  for  the  medium-risk 
group  and  3.5%  for  the  low-risk  group  (data  not  shown).  These 
findings  confirm  the  expectation  that  participants  in  the  higher 
depression  risk  groups  (high>mediunni>Iow),  on  avomgtii,  tend 
to  experience  more  persistent  and  severe  episodes  of  affective 
distress, 

Wc  next  analyzed  the  CES-D  data  from  each  depre.s,sion  risk 
group  by  treatment  group  (tamoxifen  versus  placebo)  (Fig.  1. 
b-d;  Table  4).  After  adjustment  for  examination  and  risk  group,! 
the  results  of  a  logistic  regression  found  that  there  was  a  statis¬ 
tically  nonsignificant  effect  for  the  tamoxifen  group  compared 
with  the  placebo  group  (OR  =  0.08;  05%  Cl  =  0,03  to  1.02; 
P  =  ,32).  These  analyses  indicate  that  treatment  group  is  not 
statistically  associated  with  th©  proportion  of  women  scoring 
above  the  CES-D  clinical  cutoff  of  16  or  higher  in  any  of  the 
three  depression  risk  groups.  Furthermore,  after  adjustment 
for  depression  risk  group,  an  analysis  of  variance  found  that 


o 


Table  1,  Distribution  of  sclf-reponed  risk  factors  for  clinical  depressive  disorders  at  ba«clinc  examination  among  pariicipams  of  ihe 
National  Surgical  Adjuvant  Breast  and  Bowel  Project’s  Breast  Cancer  Prevemion  (P-l)  Study 


Risk  group 
^tems  endorsed) 

History  of  depression 

Antidepressant  medications 

Persistent  dysphoria 

Risk  factor  pattern • 

No.  % 

0-3  risk  fuciorst 

No,  % 

Low  (0) 

No 

No 

No 

7964 

72.0 

7964 

72.0 

Medium  (1) 

No 

No 

Yes 

621 

5,6 

No 

Yes 

No 

668 

6.0 

1628 

14.7 

Yes 

No 

No 

339 

3.1 

Medium  (2) 

No 

Yes 

Yes 

120 

l.l 

Yes 

No 

Yew- 

203 

1.8 

053 

8.6 

Yes 

Yes 

No 

631 

5.7 

High  (3) 

Yes 

Yes 

Yes 

519 

4,7 

519 

4.7 

Total 

11064 

100.0 

1 1064 

m,o 

^Depression  risk  groups  were  assigned  on  the  basis  of  the  participants’  response  to  three  medical  history  questions;  1)  history  of  depression,  2)  use  of 
antidepressant  medication,  and  3)  persistent  mood  disturbance  (dysphoriu).  Each  positive  answer  was  worth  l  point,  Paiticipams  with  a  score  of  0  were  assigned 
to  the  low-riflk  group,  (hod«  with  a  score  of  1-2  to  the  inediuiYi-ri:»k  group,  »nd  those  with  a  score  of  3  to  the  high-rlsic  group. 
tNumber  und  percent  of  participants  endorsing  0,  (,  2,  or  3  depression  risk  factors. 
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Table  2.  Distribution  of  NSABP  P-l  participants  on  EGA  jftudy  social  and  demographic  correlates  of  clinically  diagnosed  depressive 

dlsordcTs  by  depression  risk  score* 


Soolodemogmphic  ltdm 

Low,  % 

Depression  risk  secret 

% 

High,  % 

Odde  rsliot 

confidence  interval 
on  odds  ratio 

Marital  status;  divorccd/scparatcd 

11.1 

17.7 

23.5 

1.63 

1.50  to  1.98 

Employment  status:  not  working 

7,9 

12.2 

1.78 

1.58  to  2.01 

Visited  doctor  within  last  3  mo 

7i.O 

76,4 

84,4 

1,39 

1.28  to  1.51 

Hospitalized  within  last  5  y 

42.7 

4R.6 

54,9 

1,27 

1,19  to  1.36 

Age;  ?»60  y 

29.9 

llA 

24.1 

0.87 

0.81  to  0,94 

Education:  >high  school 

66.6 

66.7 

70.0 

1.04 

0.97to|.12 

Income:  >meflian 

46,1 

,37.6 

31,3 

0,72, 

0.67  to  0.77 

■•NS ARP  P-l  —  Neturel  Surgieal  Adjuvaoi  Bx*de0t  and  Bowi^l  Project'a  Br^aei  Cunoer  Prevention  (P-l)  StMdy;  BCA  “  Ntitionttl  Institute:!  of  Mcntol  HonUh’a 
Epidemiological  Catchment  Area  study  f/9J. 

tDepression  risk  groups  were  assigned  on  the  basis  of  the  participants’  responses  to  three  medical  history  quesiionst  1)  history  of  depression^  2)  use  of 
antidepressant  medioationt  and  3)  persistent  mood  di-stuibancc  (dysphoria),  Each  positive  answer  was  worth  1  point.  Participants  with  a  score  of  0  were  assigned 
to  the  low-risk  group,  those  with  a  score  of  1-2  to  the  medium-risk  group,  and  those  with  a  score  of  3  to  the  high-risk  group. 
tOdds  rtttios  were  determined  by  binary  logistic  regtession;  P<001  for  all  groups  compared  with  referent  groups,  except  for  education,  where  P-  .235. 


there  was  no  difference  in  the  mean  individual  proportion  of 
follow- up  examination!)  above  the  clinical  cutoff  in  each  treat- 
TTient  arm. 

The  Kaplan-Mcicr  plot  in  Fig.  2  shows  tlic  relationship  be¬ 
tween  assigned  treatment  (placebo  versus  tamoxifen)  and  de¬ 


pression  risk  group  (high,  medium,  or  low)  for  the  time  from 
randomization  until  the  first  CES-D  examination  with  a  score 
exceeding  the  clinical  cutoff  of  16  or  higher.  The  results  of  Cox 
proportional  hazards  regression  analysis  with  these  data  were 
statistically  ST|nificant  for  depression  risk  group  (likelihood 


Fig»  1*  Proportion  of  participants  in  the  National  Surgical  Adjuvant  Breast  and 
Bowel  Project’s  Breusi  Cancer  Prevemion  (P-l)  Study  scoring  16  or  higher  on 
the  Cenwr  for  Epidemiological  studies — Depression  (CES-D)  scale  with  95% 
confidence  intervals  by  depre.ssion  risk  groups  (low,  medium,  or  high)  (a)  and  by 

depraitition  risk  group  tmtl  treatment  vrignment  (pluu»bt>  verKua  lamoxifan 

{TAM])  Depression  risk  groups  were  assigned  on  the  basis  of  the  par¬ 


ticipants*  responses  to  three  medical  history  questions:  1)  history  of  depression, 
2)  use  of  amidepressani  medication,  and  3)  persistent  mood  disturbance  (dys¬ 
phoria),  Each  positive  answer  was  worth  l  point,  participants  with  a  score  of  0 
were  assigned  to  the  low-risk  group,  those  with  a  score  of  1-2  to  the  medium- 
ridk  group,  and  choae  with  a  ecora  of  3  to  tha  high-risk  gi^oup, 
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Table  3.  Dlslrlhution  of  Center  for  Epidemiological  Studies — Depression 
(CES“D)  Scale  variables  for  NSABP  P-1  participants  who  scored  above  the 
clinical  cutoff  of  16  or  higher  by  depression  risk  croup”' 


|3cprcssion  risk  groupt 

CES-D  variable 

Low 

Medium 

High 

%  follow-up  examinations 
in  which  participants 
scored  5*1 6| 

Mean 

95%  Cl  for  mean 

0.087 

0.083  to  0,091 

0.192 

0,181  to  0.203 

0,3,57 

0.325 10  0.389 

Maximum  score  ^  1 6$ 
Median 

Mean 

05%  C|  for  mean 

22 

23.97 

23.66  to  24.28 

24 

23.61 

25. 16  to  26.06 

27 

28.38 

27.62  to  29,54 

All  scores  s>  1 

Median 

Mean 

95%  Cl  for  mean 

20 

21,52 

21.30  to  21.74 

2) 

22.49 

22, 17  to  22.81 

22 

23,74 

23, 10  to  24.38 

•^he  ces-p  Is  a  self-admlnlsiered  questionnulre,  composed  of  20  Hems,  each 
of  which  is  scored  on  a  scale  of  0-3.  Higher  scores  rcOcct  Increased  expression 
nf  nffectivo  dictrocfl,  and  &  total  Kcorc  of  16  or  htehor  i§  used  as  the  cutoff  point 
for  likely  cases  of  cUnkal  depression  (17^23,241  NSABP  P-l  =  National  Sur¬ 
gical  Adjuvant  Breast  and  Bowel  Project's  Breast  Cancer  Prevention  (P-l) 
Study;  Cl  =  confidence  interval. 

fDepression  risk  groups  were  assigned  on  the  basis  of  the  purtietpantfi'  re¬ 
sponses  to  three  medical  history  questions;  I)  history  of  depression,  2)  use  of 
antidepressant  medication,  and  3)  persistent  mood  disturbance  (dysphoria).  Each 
positive  answer  was  worth  1  point.  Pmtipip«nt»  with  n  score  of  0  were  assigned 
lo  the  low-risk  group,  those  with  a  score  of  1-2  to  the  medium-risk  group,  and 
those  with  a  score  of  3  to  the  higlt-risk  group. 

tXhere  is  u  statistically  significant  difference  between  al)  groups  (Kruskal- 
Wailis  and  analysis  of  variance:  P<.00l).  “Maximum  score  5:t6“  represents  the 
highest  single  CES-D  score  ^16  reported  for  an  individual,  whereas  “All  scores 
^16“  summarizes  ail  of  the  CHS-D  scores  9^16  reported  for  an  individual. 


ratio  statistic  fLRS]  P<001;  hazard  ratio  [HR]  «  1,B8;  95%  Cl 
-  1.74  to  2.05),  but  they  were  statistically  nonsignificant  for 
both  treatment  arm  effects  (LRS  P  —  ,988;  HR  =  1 ,00;  95%  Cl 
=  0.92  to  1,09)  and  interaction  effects  (LRS  P  =  ,575;  HR  = 


1.03;  95%  Cl  =  0.92  to  1.16).  The  proportional  hazards  as¬ 
sumption  for  this  analysis  was  confirmed. 

Missing  Data 

We  next  assessed  the  association  between  missing  data  and 
depression  risk  group  or  sequential  CES-D  examination  (Fig.  3, 
a).  Logistic  regression  analysis  based  on  the  data  in  Fig,  3,  a, 
indicated  that  depression  risk  group  (OR  =  1.17;  95%  Cl  = 
1,13  to  1.21;  /*<.001)  and  sequential  examination  (OR  1.45; 
95%  Cl  —  1.44  to  1.46;  P<.001)  vvcie  both  statistically  signifi¬ 
cantly  associated  with  missing  CES-D  data.  Panels  b-d  in  Fig.  3 
show  the  proportion  of  participants  completing  the  CES-D  by 
depression  risk  and  treatment  groups,  Logi.stic  regression  analy¬ 
sis  by  depresBion  risk,  controlling  for  sequential  ejiamination, 
indicates  that,  compared  with  placebo  treatment,  tamoxifen 
treatment  was  associated  with  higher  proportions  of  missing  data 
in  the  Jow-risk  group  {OR  1.11;  95%  Cl  -  L06  to  1,16; 
P<.001)  and  the  medium-risk  group  (OR  —  1,12;  95%  Cl  ** 

1 ,04  to  1 .2 1 ;  P<001 )  but  not  in  the  high-risk  group  (OR  -  0.99; 
95%  Cl  —  0.84  lo  1,16;  P  —  .91).  If  tmnoxifen-associated 
depression  were  the  primary  cause  of  these  missing  data,  we 
would  have  predicted  n*  positive  dose-response  increase  in  the 
magnitude  of  the  ORs  from  the  lowest  to  the  highest  depression 
risk  group. 

We  noted  in  our  previous  report  (J5)  that  it  was  difficult  to 
continue  to  collect  quality-of-lifc  data  after  a  participant  had 
gone  off  treatment.  However,  participants  in  the  P-l  study  were 
asked  about  their  primary  reason  for  going  off  treatment,  and 
tlieir  responses  were  recorded  on  an  Off  Therapy  Form  (OTF) 
that  included  “depression*^  as  one  of  10  specific  response  cat¬ 
egories. 

Of  the  1 1 064  participants  in  this  cohort,  we  collected  an  OTF 
for  3539  (80.8%)  of  4382  women  who  missed  al  least  one 
CES-D  examination.  The  presence  of  an  OTF  showed  a  mod¬ 
erate  positive  correlation  with  the  total  number  of  missing 
CES-D  examinations  (r  =  .62;  P<.00l).  The  women  who  com¬ 
pleted  an  OTF  accounted  for  12693  (89,7%)  of  14 149  missing 


Table  4,  Compiirison  (binary  logistic  regression)  of  the  proponlon  of  NSABP  P-l  participaflis  in  each  treatmeni  group  (tamoxifen  versus  placebo)  who  scored 
16  Or  higher  on  the  Center  for  Epidemiological  Studies — ^Depression  (CK-D)  Scale  by  depression  risk  group  and  sequential  examination* 


Sequential  examination 


Dcprc-ision  risk  groupt 

Baseline 

3  mo 

6  mo 

12  mo 

18  mo 

24  mo 

30  mo 

36  mo 

Low 

ORf 

1,22 

1.04 

1,01 

1.02 

0.88 

0.96 

0.93 

0,86 

05%  Cl 

0.96  la  J.55 

0.96  tol.2S 

0.85  to  1.19 

0.86  to  1.02 

0.73  to  1.04 

0.80  to  1.13 

0.78  to  1.12 

0.71  to  1,03 

P 

.10 

.68 

,91 

0.86 

.14 

.60 

.44 

.11 

Medium 

ORt 

1,03 

1.20 

1,10 

0.99 

0.91 

1.04 

0,96 

1.01 

95%  Cl 

0.8(  to  1.30 

1.04  to  1.60 

0.89  to  1.35 

0.81  to  1.22 

0.74(0  1.13 

0.82  to  1.30 

0.75  to  1,22 

0.79  to  1.29 

P 

.84 

.02 

.39 

,9.5 

.40 

,76 

.72 

.94 

High 

ORt 

0.89 

0.78 

0.74 

0,62 

0.S4 

0,83 

1.00 

1.00 

95%  Cl 

0,61  to  1,30 

0.54  to  1.14 

0.50  to  1.09 

0.41  to  0.92 

0.56(0  1.26 

0.54 10  1.28 

0.65  to  1.54 

0,64  to  1.57 

P 

.5d 

.21 

.13 

.02 

.40 

.40 

.99 

.90 

♦The  CH55-D  is  a  self-admin isiered  questionnaire,  composed  of  20  items,  each  of  which  is  scored  on  a  scale  of  CM,  Higher  scores  reflect  increased  expression 
of  affective  disues-s,  and  a  total  score  of  16  or  higher  is  used  as  the  cutoff  point  for  likely  cases  of  clinical  depression  (i7,2Sr24h  NSABP  P*  1  -  National  Surgical 
Adjuvant  Breust  and  Bowel  Project's  Breast  Cancer  Prevention  (P-l)  Study;  OR  =  otUHs  ratio;  Cl  =  confidence  interval. 

TDepressjon  risk  groups  were  assigned  on  the  basis  of  the  participants’  responses  to  three  medical  history  questions:  I)  history  of  depression,  2)  use  of 
uniidcprcssant  medication,  and  3)  persistent  mood  disturbance  (dysphoria).  Each  positive  answer  was  worth  I  point,  PariicipanU  with  a  score  of  0  were  as.signcd 
lo  ihe  low-rifik  group,  wjih  n  ?ieore  of  1-2  lo  the  medium-risk  group,  and  those  with  a  score  of  3  lo  the  high-risk  group. 
tOR  >1,0  indicates  n  greater  proportion  of  women  in  the  tamoxifen  group. 
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Discussion 


fig,  2.  Kaplaii-Meier  curves  of  lime  from  randomizalion  to  first  score  of  16  or 
higher  on  the  Center  for  Epidemiological  Studies— Depression  (CES-D)  Scule 
by  depression  risk  group  (low,  medium,  or  high)  ond  treatment  assignment 
(placebo  versus  tamoxifen  [TAM]).  Depression  risk  groups  were  assigned  on  the 
basis  or  the  panicipams'  responses  to  three  medical  history  questions;  1)  history 
of  depression.  2)  use  of  antidepressant  medication,  and  3)  persistent  mood 
dt8mrbAnc»  (dysphoria).  Each  poaitive  answer  wns  worth  I  point.  PArticiponts 
with  a  score  of  0  were  assigned  lo  the  low-risk  group,  those  with  a  score  of  1-2 
to  the  medium-risk  group,  and  those  with  a  score  of  3  lo  the  high-risk  group. 
At  10  months,  for  the  patients  who  received  lamoxifen,  in  the  low-risk,  group 
there  were  3150  patieni-s  m  risk  of  depression  (proportion  remaining  =  0.864: 
95%  confidence  interval  [Cl]  «  0.853  to  0,875);  in  the  medium-risk  group  there 
were  799  patients  at  risk  (proportion  remaining  =  0.685;  95%  Cl  =  0.659  to 
0,711)?  and  in  the  high-risk  group  there  were  123  pAiients  at  risk  (proportion 
remuining  =  0.488;  95%  Cl  0.427  to  0.549).  At  30  months,  for  the  paiiems 
who  received  tamoxifen,  in  the  low-risk  group  ihere  were  2233  patients  at  risk 
for  depression  (proportion  remaining  =  0,746;  9,5%  Cl  =  0.732  to  0.760);  in  the 
medium-riiik  group  there  were  496  patients  at  risk  (proportion  remaining  = 
0.528;  95%  Cl  =  0,499  to  0.557);  and  in  the  high-risk  group  iliere  were 
61  patients  at  risk  (proportion  remaining  =  0.317;  95%  Cl  =  0.258  to  0.376). 
At  to  months,  for  ihe  patients  wno  received  the  placebo,  in  the  low-nsic  group 
there  were  3190  patients  at  risk  for  depression  (proportion  remaining  “  0.870; 
95%  Cl  =  0.859  ro  0.88t);  in  the  medimn-ri»:k  group  there  were  863  patients  at 
risk  (proportion  remaining  =  0.713;  95%  Cl  =  0.688  to  0.738);  and  in  the 
high-risk  group  there  were  108  paiienis  at  risk  (proportion  remaining  =  0,475; 
95%  Cr  0.412  to  0.538).  At  30  months,  for  the  patients  who  received  the 
placebo,  in  the  low-risk  group  there  were  2326  patiems  at  risk  for  depression 
(proportion  remaining  =»  0.753;  95%  Cl  =  0.738  to  0.767);  in  the  medium  risk 
group  there  were  544  patients  at  risk  (proportion  remaining  =  0.535;  95%  Cl  ^ 
0.506  to  0,563);  and  in  the  high-risk  group  there  were  59  patients  at  risk 
(proportion  remaining  0.316;  95%  Cl  =  0.254  to  0.377). 


CES-D  examinations.  Only  IIO  (3.1%)  of  these  3539  women 
reported  that  depression  was  the  primary  reason  for  their  going 
off  therapy.  The  most  frequent  reasons  for  going  off  therapy 
were  nonmedical  in  nature  (1667  women  [47,1%]),  perceived 
toxic  effects  (921  women  [26.0%]),  and  various  protocol  and 
nonprotocol  medical  conditions  (841  women  (23.8%J). 

Table  5  shows  the  distribution  of  women  who  reported  that 
depression  was  their  primary  reason  for  going  off  treatment  by 
treatment  group  and  depression  risk  group.  An  analysis  of  these 
data  using  binary  logistic  legression  found  a  statistically  signifi¬ 
cant  effect  for  depression  risk  group  (OR  -  2.37;  95%  Cl  - 
1,83  to  3.07;  /’^OOl)  and  a  statistically  nonsignificant  effect  for 
treatment  group  (OR  =  J .  10: 95%  Cl  =  0,75  to  1 .62:  P  =  .63). 
indicating  that  the  cases  of  depression  that  lead  women  to  quit 
their  assigned  treatment  did  not  occur  witli  a  greater  frequency 
in  tiiose  in  the  tamoxifen  arm. 


Tamoxifen  is  the  most  widely  prescribed  anticancer  agent 
currently  in  use.  It  has  been  proven  to  be  effective  against  breast 
cancer  as  an  adjuvant  treatment  and  in  a  preventative  setting 
(12,32),  Given  the  widespread  use  of  tamoxifen*  it  is  important 
to  fully  investigate  all  of  the  potential  side  effects  that  may  be 
associated  with  its  administration,  so  that  women,  together  with 
their  physicians,  can  make  an  informed  decision  regarding  its 
potential  costs  and  benefits  and  its  appropriateness  for  their  in¬ 
dividual  situations. 

This  study  is  an  extension  of  our  earlier  report  (J6)  on  the 
HRQL  data  ffom  the  NSABP  P-1  study.  Previously,  wc  found 
no  evidence  for  an  association  between  tamoxifen  treatment  and 
depression  in  the  overall  P-1  study  cohort.  In  this  study,  we 
recognized  that  vulnerability  to  clinically  identifiable  depressive 
disorders  is  not  randomly  distributed  in  the  general  female  popu¬ 
lation  and  that  the  effects  of  tamoxifen  on  susceptible  women  in 
the  P-1  study  may  have  previously  gone  undetected. 

Our  initial  problem  was  the  a  priori  identification  of  sub¬ 
groups  of  women  with  a  potential  clinical  susceptibility  for  de¬ 
pression.  Because  the  self-adminisieied  depression-screening 
form  (CES-D)  used  in  the  P-1  study  provides  information  on 
sbort’-lerm  symptoms  of  affective  distress  and  is  not  intended  for 
use  as  a  diagnostic  instrument  (77),  we  incorporated  the  partici¬ 
pants’  self-reported  medical  history  of  depression,  use  of  pre¬ 
scription  antidepressant  medications,  and  experience  of  ex¬ 
tended  periods  (>12  months)  of  dysphoric  mood  to  assign 
clinical  risk.  On  the  basis  of  these  data,  women  were  prospec¬ 
tively  assigned  to  one  of  three  depression  risk  groups.  We  hy¬ 
pothesized  that  the  higher  a  woman's  depression  risk  group,  the 
greater  the  likelihood  that  she  would  experience  a  clinically 
diagnosable  episode  of  depression. 

The  P-1  study  staff  were  trained  to  check  the  consistency  and 
appropriateness  of  the  self-reported  data  about  prior  treatment 
for  depression  and  the  use  of  antidepressant  medications  as  a 
routine  part  of  the  medical  screening  procedure  cairied  out  dur¬ 
ing  entry/eligibility  interview.  These  procedures  were  designed 
specifically  to  minimize  false-positive  classification  errors. 
However,  there  was  little  that  the  interviewer  could  do  to  detect 
false-negative  classification  errors  in  which  a  potential  partici¬ 
pant  did  not,  for  whatever  reason,  report  the  requested  screening 
information.  The  overall  effect  of  this  inability  to  control  for 
false-negative  classification  errors  for  the  current  study  was  to 
create  a  potential  misclassificalion  bias  in  which  women  at  in¬ 
creased  risk  for  depression  may  have  been  placed,  at  an  un¬ 
known  rate,  in  one  of  the  lower  risk  groups.  Although  less  than 
ideal,  the  effect  of  this  bias  is  conservative  in  nature,  operating 
lo  maintain  the  comparative  validity  of  the  most  important  high- 
risk  depression  group. 

We  found  a  statistically  significant  dose-response  relation¬ 
ship  between  the  level  of  the  depression  risk  group  (high> 
medium>low)  and  the  proportion  of  the  women  in  each  depres¬ 
sion  risk  group  who  scored  above  the  clinical  cutoff  of  16  or 
higher  on  the  CES-D  at  baseline  and  at  every  foilow-iip  inter¬ 
view.  In  addition,  women  in  the  higher  risk  groups  (high> 
mediam>low)  scored  above  the  clinical  cutoff  on  a  greater  pro¬ 
portion  of  their  follow-up  interviews  and.  on  average,  had  higher 
maximum  CES-D  scores.  Together,  these  data  suggest  that  there 
was  a  dose-response  effect,  in  which  women  in  the  higher  de¬ 
pression  risk  groups  (high>medium>low)  were  more  likely  to 
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nfinicd  qualiiy-of-lifc  qucsiionnairc  by  depression  risk  croups  flow,  medium,  or 
high)  (*)  and  by  depression  risk  group  and  treatment  as,signmcnt  (placebo  versus 
tamoxifen  [TAM])  with  95%  confidence  intervals  (b-d).  Depression  risk  groups 
wore  assigned  on  the  basis  of  the  punicipanis’  responses  lo  three  medical  history 


I  point.  Participants  with  n  score  of  0  were.  Bssigned  to  the  low^rtsk  group.  tho«e 
with  a  score  of  1-2  to  the  medium-risk  group,  and  those  with  a  score  of  3  to  the 
high-risk  group. 


Table  5.  Reasons  cited  for  going  off  treatment  by  depression  risk*^  and  treatment  group 


Uiw  risk  Medium  risk  High  risk 


Reasons  cited  for  going  off  ircmmcnt 

Placebo 

Tamoxifen 

Placebo 

Tamoxifen 

Placebo 

Tamoxifen 

Overall 

Depression  (No.  of  paniclpants) 

20 

27 

21 

24 

9 

9 

no 

Oth«r  renBona  (No.  of  ptirhoipanl») 

1130 

1275 

416 

431 

83 

94 

3429 

Depn?.sJ»ion  as  %  of  nil  off-trealmenl  reasons 

1,7 

2.1 

4.8 

5.3 

9,8 

8.7 

3.1 

♦Depression  risk  groups  were  assigned  on  the  basis  of  the  participants*  responses  to  three  medical  history  questions^  1)  history  of  depression,  2}  use  of 
antidepressant  medication,  and  3)  pcr?;}srcnt  mood  disturbance  (dysphoria).  Each  positive  answer  was  worth  !  point.  Participants  with  a  score  of  0  were  assigned 
to  the  low-risk  group,  tho.se  with  u  .score  of  1-2  to  the  medium-risk  group,  and  those  with  u  score  of  3  to  the  high-risk  group. 


(ixptirience  clinically  significant  episodes  of  affective  distress 
and  that  these  episodes,  on  average,  were  more  persistent  and 
more  severe  than  the  episodes  in  the  lower  risk  groups.  Finally, 
we  found  that  the  distribution  of  social  and  demographic  corre- 
tales  (i.e,.  age,  marital  and  employment  status,  educational  level, 
and  use  of  medical  services)  across  the  three  depression  risk 
groups  defined  in  this  study  followed  the  same  general  patterns 
of  risk  previously  identified  in  the  ECA  study  of  depression 
among  the  general  population  (27).  All  of  the  above  findings 
serve  to  support  the  validity  of  the  risk  assignments  used  in  our 
Study. 


The  primary  test  of  our  research  question  involved  stratifying 
each  depression  risk  group  by  treatment  assignment  (tamoxifen 
versus  placebo)  and  comparing  the  corresponding  proportions  of 
women  at  each  follow-up  interview  who  scored  above  the  clini¬ 
cal  cutoff  of  16  or  higher  on  the  CES-D.  We  found  no  effect  of 
tamoxifen  for  any  of  the  three  depression  risk  groups. 

Besides  the  lack  of  a  positive  association  between  tamoxifen 
use  and  depression,  tharo  are  at  least  two  possible  alternative 
explanations  for  our  negative  findings:  lack  of  statistical  power 
and  missing  data.  We  carried  out  a  post-hov  effect  size  analysis 
to  determine  the  size  of  the  difference  between  the  treatment 
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arms  that  might  have  been  detected.  For  our  highest  risk  depres¬ 
sion  group  (n  =  519)»  we  had  an  80%  chance  of  detecting  at 
toast  a  37%  (OR  >1.37)  increase  between  the  two  study  arms  in 
the  proportions  of  women  scoring  above  the  CES-D  clinical 
cutoff  of  16  or  higher  at  any  single  examination  point.  When  a 
repeated  measures  design  was  used,  we  had  sufficient  power  to 
detect  a  mean  increase  of  24%  (OR  ^1.24)  in  the  proponion  of 
women  in  either  arm  scoring  above  the  CES-D  clinical  cutoff 
(S3, 34).  We  considered  these  to  be  acceptable  levels  of  statistical 
power  for  the  identification  of  clinically  significant  treatment 
effects  III  our  high-risk  depression  group.  The  detectable  ORs 
were,  of  course,  even  smaller  for  the  low-  and  medium-risk 
depression  groups. 

We  also  assessed  the  contribution  of  missing  data  to  explain 
the  negative  association  between  tamoxifen  and  depression  in 
the  P-1  study.  An  initial  analysis  showed  that  assigned  depres¬ 
sion  risk  was  statistically  significantly  associated  with  missing 
data  rates  over  the  course  of  the  study.  If  a  tamoxifen-associated 
depression  was  the  primary  cause  of  these  rates,  we  would  have 
predicted  that  the  tamoxifen  treatment  group  in  the  higher  de¬ 
pression  risk  groups  would  show  a  progressively  greater  differ¬ 
ential  off-treatment  rate  than  the  placebo  group.  This  expectation 
was  not  confinned  by  our  data  for  die  high-risk  depression 
group. 

In  addition,  we  also  examined  the  reasons  given  for  going  off 
the  assigned  treatment.  There  was  a  strong  statistical  association 
in  the  P-1  study  between  stopping  assigned  treatment  and  miss¬ 
ing  HRQlr  data  ( 16).  An  analysis  of  the  reasons  for  going  off 
treatment  in  81%  of  the  women  with  missing  HRQL  data  re¬ 
sulted  in  the  following  observations:  (a)  Depression  was  cited  as 
a  relatively  infrequent  reason  for  going  off  treatment;  (fr)  the 
higher  the  depression  risk  group,  the  greater  the  likelihood  that 
depression  was  cited  as  the  reason  for  going  off  treatment;  and 
(c)  within  each  depression  risk  group,  depression  was  cited  as 
the  reason  for  going  off  treatment  by  similar  proportions  of 
women,  regardless  of  treatment  assignment.  A  separate  report 
(35)  has  implemented  a  sensitivity  analysis  on  these  data  with 
equally  negative  results.  The  findings  in  our  report  together  with 
this  sensitivity  analysis  indicates  that  there  are  no  clear  patterns 
in  the  missing  data  that  serve  to  undermine  the  conclusions 
drawn  from  our  primary  analysis. 

The  results  of  our  analysis  strengthen  our  previous  conclusion 
regarding  lack  of  evidence  for  an  association  between  tamoxifen 
use  and  depression  in  the  P-1  study  data  by  provisionally  ex¬ 
tending  our  findings  to  subgroups  of  women  at  a  high  risk  for 
clinically  identifiable  episodes  of  depression.  Clinically,  these 
findings  have  two  major  implications.  First,  the  evidence  from 
NSABP’s  P-1  study  does  not  lend  support  to  the  idea  that 
tamoxifen  should  be  considered  to  be  a  causal  risk  factor  for  the 
onset  of  depressive  symptoms  and/or  the  prolongation  of  depres¬ 
sive  episodes  that  occur  among  treated  women.  Second,  the 
findings  of  this  study  suggest  that  physicians  need  not  automati¬ 
cally  disqualify  women  as  candidates  for  tamoxifen  treatment 
simply  because  they  report  a  history  of  depressive  symptoms  or 
prior  treatment  for  a  depressive  disorder.  Neverthc!e.ss,  it  is  still 
essential  that  physicians  carefully  screen  for  affective  disorders 
and  treat  or  refer  potential  cases  of  depression  encountered  in 
routine  clinical  practice. 

Finally,  there  ai’e  two  important  limitations  on  these  conclu¬ 
sions  that  nequire  discussion,  one  statistical  and  the  other  meth¬ 
odological.  Statistically,  it  was  the  large  .size  of  the  P-l  study 


that  permitted  us  to  identify  and  carry  out  stratified  analyses  of 
groups  of  women  with  a  differential  risk  for  depression.  How¬ 
ever,  we  also  noted  that  there  were  limits  on  our  statistical  power 
to  detect  an  increase  in  the  proportion  of  women  reporting  clini¬ 
cally  significant  levels  of  depressive  symptoms  on  the  CES-D, 
particularly  in  the  high-risk  depression  group.  For  this  reason, 
we  cannot  absolutely  exclude  the  possibility  that  there  may  be 
rare  cases  in  which  women  react  negatively  to  tamoxifen  treat¬ 
ment  with  potentially  life-threatening  depressions.  Here,  it  is 
useful  to  recall  that  data  on  neuro-mood  toxic  effects  were  col¬ 
lected  for  P-l  study  participants  and  periodically  reviewed  as 
part  of  the  routine  safety-monitoring  procedures.  Over  the  full 
course  of  the  P-l  study,  iheie  were  a  total  of  three  women  who 
committed  suicide,  one  woman  from  the  placebo-treated  group 
and  two  women  from  the  tamoxifen-treated  group,  and  there 
were  no  statistically  significant  differences  in  the  distribution  of 
women  reporting  suicidal  ideation  across  the  two  trial  arms. 

The  methodological  limitations  of  this  article  (i,e.,  the  lack  of 
standardized  psychiatric  diagnoses  and  missing  HRQL  data)  are 
primarily  due  to  the  fact  that  the  goals  of  this  study  were  sec¬ 
ondary  to  the  main  clinical  objectives  that  determined  the  design 
of  the  P-l  study.  A  more  definitive  analysis  would  require  ad¬ 
ditional  data  from  a  potentially  smaller,  yet  more  focused  study, 
in  which  an  investigation  of  the  relationship  between  clinical 
depression  and  tamoxifen  treatment  was  the  primaiy  scientific 
objective.  Such  a  study  would  have  to  have  the  following  mini¬ 
mum  features:  (a)  a  double-blind,  placebo-conti’olled,  random¬ 
ized  design;  (b)  participants  who  are  at  high  risk  for  breast 
cancer,  rather  than  breast  cancer  patients  (to  avoid  potential 
confounding  due  to  clinical  diagnosis  and  treatment);  (c)  par¬ 
ticipants  who  are  stratified  on  a  reliable  measure  of  risk  for 
affective  disorder  (e,g„  lifetime  diagnosis,  Schedule  for  Affec¬ 
tive  Disorders  and  Schizophrenia — Lifetime  Version);  (d)  peri¬ 
odic  administration,  in  whole  or  in  pan,  of  a  standardized  psy¬ 
chiatric  diagno.stic  instrument  (e.g.,  Diagnostic  Interview 
Schedule)  by  a  trained  interviewer;  and  (e)  continued  collection 
of  the  psychiatric  interview  data  even  if  the  participant  goes  off 
the  assigned  treatment  for  any  reason,  except  death  or  consent 
withdrawal.  Whether  the  additional  information  obtained  from 
such  a  study  would  justify  the  time  and  the  expense  involved  in 
its  collection  is  a  problematic  question  that  is  beyond  the  scope 
of  this  article. 
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ABSTRACT 


Background:  We  examined  the  psychological  effects  of  tamoxifen  treatment  for 
the  breast  cancer  prevention  on  women  at  different  levels  of  risk  for  clinical  depression 
to  determine  whether  tamoxifen  treatment  is  associated  with  the  onset  of  and/or 
prolongs  the  length  of  existing  episodes  of  clinical  depression  in  psychologically 
vulnerable  individuals  from  the  National  Surgical  Adjuvant  Breast  and  Bowel  Project's 
Breast  Cancer  Prevention  (P-1)  study.  Methods:  We  report  on  the  baseline  and  first  36 
months  of  follow-up  data  for  1 1 ,064  women,  randomly  assigned  to  daily  doses  of  20  mg 
of  tamoxifen  or  placebo  in  the  P-1  study,  which  is  a  multi-center,  double-blind,  placebo- 
controlled  chemoprevention  trial.  Each  woman  was  prospectively  assessed  for 
depression  risk  on  the  basis  of  medical  history  items  collected  at  the  baseline 
examination  and  placed  in  a  high-,  medium-  or  low-risk  group.  Every  6  months,  the 
participants  were  assessed  for  depressive  symptoms  by  completing  the  Center  for 
Epidemiological  Studies-Depression  (CES-D)  questionnaire.  Scores  of  >16  were 
indicative  of  an  episode  of  affective  distress.  Differences  between  the  risk  groups  and 
treatment  arms  were  assessed  by  logistic  regression.  All  statistical  tests  were  two- 
sided.  Results:  Women  in  the  high-risk  depression  group  were  more  likely  to  score  >16 
on  the  CES-D,  more  frequently  and  for  longer  time  intervals  than  women  in  the  lower 
risk  groups.  Within  the  depression  risk  groups,  there  was  no  difference  in  the  proportion 
of  women  scoring  >16  by  treatment  assignment  (tamoxifen  versus  placebo)  (OR  =  0.98; 
95%  Cl  =  0.93  to  1 .02).  The  lack  of  a  tamoxifen  effect  was  not  a  result  of  differential 
missing  data.  Conclusions:  Physicians  need  not  be  overly  concerned  that  treatment 
with  tamoxifen  will  increase  the  risk  for  or  exacerbate  existing  depression  in  women. 
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Nevertheless,  physicians  should  screen  for  and  treat  or  refer  potential  cases  of 
depression  encountered  in  routine  clinical  practice. 

INTRODUCTION 

Concern  regarding  an  association  between  clinical  depression  and  tamoxifen, 
when  used  as  an  adjuvant  treatment  or  preventative  agent  for  breast  cancer,  has  been 
voiced  by  a  number  of  investigators  (1-5)  and  continues  to  be  discussed  in  regulatory 
agencies,  such  as  the  U.S.  Food  and  Drug  Administration.  Furthermore,  the  Physician's 
Desk  Reference  (6)  lists  "depression"  as  an  infrequent  adverse  reaction  to  tamoxifen. 
Although  prior  studies  (1-5)  used  breast  cancer  patients  to  address  tamoxifen  use  and 
depression,  the  studies  had  a  number  of  weaknesses,  including  the  lack  of  a  clear 
definition  of  depression  and  a  failure  to  control  for  the  potential  confounding  effects  of 
illness  diagnosis,  the  side-effects  of  chemotherapy  (e.g.,  premature  menopause),  or 
normal  aging.  Previously,  two  double-blind,  placebo-controlled  studies  of  the  effects  of 
tamoxifen  in  post-menopausal  women  (7-8)  found  no  association  with  depression.  We 
believe  that  some  of  the  concern  over  the  relationship  between  tamoxifen  and 
depression  arises  from  the  idea  that  because  hormone  replacement  therapy  has 
positive  effects  on  mood  and  tamoxifen  has  anti-estrogenic  activity  (9-11),  tamoxifen, 
therefore,  has  negative  effects  on  mood. 

The  completion  of  the  National  Surgical  Adjuvant  Breast  and  Bowel  Project 
(NSABP)  Breast  Cancer  Prevention  Trial  (P-1)  provides  an  opportunity  to  investigate 
the  association  between  tamoxifen  and  depression  in  greater  detail.  The  P-1  study  was 
a  multi-center,  double-blind,  placebo-controlled  chemoprevention  trial.  The  primary 
objective  of  the  study  was  to  evaluate  whether  5  years  of  tamoxifen  therapy  would 
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reduce  the  incidence  of  invasive  breast  cancer  in  women  at  an  increased  risk  for  the 


disease.  The  secondary  objectives  included  assessing  the  incidence  of  ischemic  heart 
disease,  bone  fractures,  and  other  negative  health  events,  such  as  depression,  that 
might  be  associated  with  tamoxifen  therapy.  Eligible  participants  were  randomly 
assigned  to  receive  20  mg  daily  of  tamoxifen  or  a  placebo  for  5  years.  Detailed  reports 
on  the  rationale,  planning,  design,  and  clinical  outcome  of  the  P-1  study  are  available 
elsewhere  (12-16). 

In  our  initial  health-related  quality  of  life  (HRQL)  publication  (16)  of  all  subjects  in 
the  P-1  study,  we  did  not  find  a  difference  between  the  treatment  groups  (tamoxifen 
versus  placebo)  on  the  Center  for  Epidemiological  Studies-Depression  (CES-D)  scale 
(21)  or  the  SF-36  Mental  Health  scale  (35).  It  is  known,  however,  that  vulnerability  to 
clinically  identifiable  forms  of  depression  is  not  uniformly  distributed  in  the  general 
female  population,  but,  instead,  clusters  in  high-risk  groups  of  women  (17).  This 
vulnerability  to  depression  may  be  inherited,  suggesting  a  genetic  or  familial  origin,  or  it 
may  be  related  to  certain  psychological  predispositions,  such  as  a  low  self-esteem,  a 
poor  resistance  to  stress,  or  a  pessimistic  view  of  the  world.  We  were  concerned  that 
the  potential  negative  effects  of  tamoxifen  for  women  at  high-risk  for  depression  may 
have  been  masked  in  our  previously  analysis  (16)  because  of  the  simultaneous 
inclusion  of  a  larger  group  of  less  vulnerable  (i.e.,  low-risk)  participants.  In  this  study, 
we  investigated  the  effects  of  tamoxifen  for  women  at  different  levels  of  risk  for 
depression.  Specifically,  we  were  interested  in  whether  tamoxifen  treatment  was 
associated  with  the  onset  or  prolongs  the  length  of  existing  episodes  of  clinically 
diagnosable  depression  in  women  at  high-risk  for  depression. 
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PATIENTS  AND  METHODS 


Participant  cohort  and  HRQL  data 

This  article  covers  the  baseline  and  first  36  months  of  follow-up  data  on  the  same 
1 1 ,064  women  used  in  the  initial  HRQL  report  (16)  from  the  P-1  study.  The  P-1 
participants'  ranged  in  age  from  35-79  years  (mean  =  53.8  ±  9.2  years),  were 
predominately  white  (95.6%),  well  educated  (>  some  college  =  64.9%),  and  currently 
employed  (full  or  part-time  =  64.7%)  in  a  professional  or  technical  field  (67.9%).  A 
detailed  description  of  this  cohort  of  participants  and  the  P-1  HRQL  instruments  are 
available  in  previous  publications  (14,16).  All  investigations  conducted  in  the  P-1  study 
were  approved  by  review  boards  at  each  institution  and  were  in  accord  with  an 
assurance  filed  with  and  approved  by  the  U.S.  Department  of  Health  and  Human 
Services  (12).  All  participants  provided  written,  informed  consent. 

Defining  Depression 

Depressive  disorders,  as  defined  by  the  current  psychiatric  nomenclature  in  the 
Diagnostic  and  Statistical  Manual  of  Mental  Disorders  IV  (DSM-IV,  18),  come  in  a 
variety  of  forms  that  differ  on  the  basis  of  the  number,  severity,  and  persistence  of 
symptoms.  The  majority  of  clinically  diagnosable  episodes  of  depression  involve  one  of 
three  disorders  -  major  depression,  dysthymia,  or  a  bipolar  illness  (17).  Major 
depression  involves  an  illness  episode  lasting  at  least  2  weeks  that  includes  mood 
disturbance  (dysphoria)  and  at  least  four  of  the  following  symptoms:  sleep  disturbance, 
change  in  psychomotor  activity,  loss  of  ability  to  experience  pleasure  and  interest, 
fatigue,  feelings  of  worthlessness  or  guilt,  difficulty  in  concentrating,  and  a 
preoccupation  with  death  or  a  wish  to  die.  These  symptoms  must  be  associated  with  a 
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clear  impairment  in  social  functioning.  Dysthymic  disorder  or  dysthymia  is  a  chronic 
illness  lasting  at  least  2  years.  Dysthymia  does  not  show  the  same  levels  of  social 
impairment  found  in  major  depression,  but  does  involve  mood  disturbance  (dysphoria) 
and  a  ioss  of  the  abiiity  to  experience  pleasure  and  interest  in  usual  activities,  together 
with  some  of  the  other  symptoms  used  to  define  major  depression.  Individuais 
diagnosed  with  dysthymia  often  experience  episodes  of  major  depression  during  their 
lifetime.  DSM-IV  distinguishes  bipolar  disorders  from  depressive  disorders.  Bipolar 
disorders  have  dramatic  ciinical  manifestations  that  involve  one  or  more  episodes  of 
hypomania  during  an  individual's  lifetime  alternating  with  illness  episodes  that  fit  the 
criteria  for  major  depression  disorder. 

Depression  was  previously  defined  by  the  Research  Diagnostic  Criteria  (RDC, 
19),  a  non-ciinical  forerunner  of  the  current  DSM-IV  criteria.  The  RDC  used  similar 
criteria  as  the  DSM-IV  to  define  "major  depression"  and  included  a  diagnosis  of  "minor 
depression"  (nonpsychotic  episodes  of  iilness  characterized  by  a  prominent  and 
sustained  dysphoria,  but  iacking  all  the  symptomatic  features  of  major  depression)  that 
was  not  included  in  the  DSM-IV.  Although  important  historically,  the  RDC  has  been 
superseded  by  the  DSM-IV. 

One  of  the  problems  associated  with  the  definition  of  depression  is  that  in 
addition  to  these  diagnosabie  clinical  entities,  there  are  multiple  sources  of  affective 
distress  that  may  result  in  short-term  or  self-limiting  expressions  of  depressive 
symptoms  without  meeting  the  DSM  IV  criteria  outlined  above.  The  best  available  data 
on  rates  of  clinically  diagnosabie  depressive  disorders  in  the  United  States  general 
population  come  from  the  National  Institute  of  Mental  Health's  Epidemiological 
Catchment  Area  Study  (ECA,  17).  ECA  investigators  found  that,  even  though  clinically 
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diagnosable  depressive  disorders  are  relatively  rare,  usually  affecting  only  5-6%  of  the 
general  female  population  during  any  12-month  period,  the  reporting  of  depressive 
symptoms  is  reasonably  frequent,  with  35.7%  of  the  women  in  the  EGA  study  (17) 
reporting  having  experienced  a  period  of  dysphoria  (feeling  sad  or  blue)  lasting  at  least 
2  weeks.  These  expressions  of  affective  distress  which  fail  to  meet  the  clinical  criteria 
for  major  depression,  dysthymia,  or  bipolar  illness  are  often  associated  with 
occurrences  such  as  uncomplicated  grief,  medical  illness  and  other  life  events,  or 
chronic  difficulties  (20).  Depressive  symptoms  may  also  occur  secondary  to  other 
psychiatric  illnesses  (i.e.,  anxiety  disorders  or  phobias),  chronic  medical  conditions,  or 
substance  abuse. 

Monitoring  Depressive  Symptoms  in  the  P-1  Study 

The  primary  instrument  used  to  monitor  depressive  symptoms  in  the  P-1  study 
was  the  CES-D  (21).  This  self-administered  questionnaire  was  designed  to  be  a  brief, 
first  stage  screen  rather  than  a  clinical  diagnostic  instrument.  The  CES-D  is  composed 
of  20  items,  each  of  which  is  scored  on  a  scale  of  0-3.  Higher  scores  reflect  increased 
expression  of  affective  distress  and  a  score  of  >16  is  most  often  used  as  the  cut-off 
point  for  likely  cases  of  clinical  depression  (21-23). 

There  are  two  problems  associated  with  use  of  the  CES-D  alone  to  screen  for 
clinically  diagnosable  episodes  of  depression.  First,  questions  on  the  CES-D  only 
inquire  about  the  past  7  days,  collecting  little  information  on  the  length  of  time  that  a 
symptom  has  been  present.  Second,  the  CES-D  only  collects  information  on  symptoms 
and  not  the  degree  of  social  impairment  experienced  by  the  respondent.  Consequently, 
scores  above  the  CES-D  clinical  cut-off  point  of  16  tend  to  include  a  substantial 
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proportion  of  distressed  individuals  -  perhaps,  upwards  of  one-half  or  more  -  who  do  not 
meet  the  clinical  criteria  for  major  depression,  dysthymia,  or  bipolar  illness  (23,24). 
Estimating  Depression  Risk  in  P-1  Participants 

The  P-1  eligibility  criteria  permitted,  at  the  discretion  of  the  local  site  investigator, 
the  inclusion  of  women  with  evidence  of  clinical  depression.  Twenty  to  22%  of  the 
participants  scored  >16  on  the  CES-D  at  least  once  during  any  12-month  period  of  the 
P-1  study.  This  figure  exceeds  the  expected  general  population  rates  (5-6%,  [17])  of 
clinically  diagnosable  depressive  disorders  over  a  12-month  period  by  3.5-4. 0  times, 
indicating  that  it  is  necessary  to  distinguish  between  clinically  diagnosable  episodes  of 
depression  and  depressive  symptoms  that  are  secondary  to  other  types  of  physical  and 
psychiatric  illnesses  or  a  consequence  of  social  conditions  that  produce  short  term,  self- 
limiting  expressions  of  affective  distress.  The  preferred  means  to  make  such  a 
distinction  would  be  a  standardized  psychiatric  interview,  such  as  the  Schedule  for 
Affective  Disorders  and  Schizophrenia  -  Lifetime  Version  (25)  or  the  Diagnostic 
Interview  Schedule  (17).  However,  in  the  absence  of  such  an  interview,  the  best  single 
indicator  of  risk  for  a  future  episode  of  major  depression,  dysthymia,  or  bipolar  disorder 
in  the  P-1  data  is  a  prior  medical  history  of  treatment  for  these  disorders  (26-29. 

The  EGA  study  (17)  found  that  the  mean  age  at  onset  for  major  depressive 
disorders  in  the  general  population  was  27  years,  with  approximately  89%  of  all  first 
depressive  episodes  occurring  before  age  35  years,  which  was  the  lower  age  limit  of  the 
participants  in  the  P-1  study.  Medical  history  information,  collected  on  a  one-time  only 
basis  as  a  part  of  the  baseline  entry  and  eligibility  assessment  of  all  P-1  participants, 
included  three  self-reported  items  regarding  depression:  (a)  a  prior  medical  history  of 
depression;  (b)  current  or  previous  prescriptions  for  anti-depressant  medications;  and 


Draft  #11  8.10.01  JNCI#3 


8 


(c)  extended  periods  (12  or  more  months)  of  dysphoric  mood  (i.e.,  "depressed  or  sad 
most  days").  If  a  participant  gave  a  positive  answer  to  either  the  medical  history  or  the 
medication  question,  the  interviewer  obtained  dates  of  treatment,  physicians'  names, 
specific  modalities  of  treatment,  and  date  of  last  medication  dose  to  assess  the 
consistency  and  appropriateness  of  the  information  provided. 

These  three  medical  history  items  were  used  in  the  current  study  to  prospectively 
estimate  each  participant's  risk  of  experiencing  a  clinically  diagnosable  episode  of 
depression.  A  simple  three  level  risk  score  was  determined  for  each  P-1  participant 
depending  on  whether  they  endorsed  0  (low-risk),  1  or  2  (medium-risk),  or  3  (high-risk) 
of  the  medical  history  items  regarding  depression  in  the  Entry/Eligibility  Form.  We 
hypothesized  that  women  with  higher  scores  on  this  simple  depression  risk  scale  would 
experience  more  severe  and  persistent  episodes  of  affective  distress,  and  would  be 
more  likely  to  receive  a  clinical  diagnosis  of  depression.  Moreover,  if  tamoxifen  was 
associated  with  the  onset  and/or  prolonged  the  length  of  depressive  episodes  in  the 
high-risk  (i.e.,  more  vulnerable)  group,  it  should  be  apparent  from  longitudinal 
differences  in  the  proportion  of  P-1  participants  in  the  treatment  groups  (tamoxifen 
versus  placebo)  who  scored  >16  on  the  CES-D. 

Statistical  Analysis 

CES-D  scores  were  analyzed  as  above  or  below  the  clinical  cut-off  of  >16. 

Binary  logistic  regression  was  the  primary  method  of  statistical  analysis  used  in  this 
study.  Estimated  risk  ratios  (ORs),  confidence  intervals  (CIs),  and  P-values  are  provided 
for  all  inferential  analyses.  Cox  regression  analysis  was  used  to  investigate  the  effects 
of  treatment  and  depression  risk  on  the  time  to  the  first  CES-D  with  a  score  >16  and 
Kaplan-Meier  plots  are  provided  for  these  data.  When  the  CES-D  data  were  handled  as 
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a  continuous  variable,  nonparametic  equivalents  to  a  one-way  ANOVA  (i.e.,  Kruskal- 
Wallis  test)  were  used  because  it  is  unusual  for  CES-D  scores  to  be  normally 
distributed.  Graphic  presentations  include  95%  CIs  on  observed  proportions  to  provide 
the  reader  with  visual  criteria  for  the  magnitude  of  potential  variation.  Reported  P- 
values  are  all  two-sided  and  have  not  be  adjusted  for  multiple  statistical  comparisons. 
Instead,  we  have  chosen  to  focus  on  consistent  patterns  of  findings,  rather  than 
individual  statistical  tests  in  forming  our  conclusions.  We  also  avoided  the  use  of 
statistical  methods  for  imputation  of  missing  data  points  in  the  primary  data  because  the 
data  did  not  meet  the  strong  assumptions  that  normally  underlie  such  procedures  (e.g., 
MCAR/MAR).  Analyses  were  carried  out  using  Minitab  (Ver.  13)  and  Egret  (Ver.  1 .0). 

RESULTS 

Depression  risk 

To  determine  whether  there  was  an  association  between  depression  and 
tamoxifen  treatment  in  participants  of  the  P-1  study,  we  first  calculated  the  depression 
risk  score  from  the  frequency  of  responses  to  each  one  of  the  medical  history  items 
(Table  1).  The  three  components  of  this  score  were  only  moderately  intercorrelated. 
The  highest  correlation  occurred  between  a  prior  history  of  illness  and  anti-depressant 
medications  (r=  0.564,  P  <  .001),  followed  by  prior  history  and  persistent  dysphoric 
mood  (r=  0.369,  P<  .001),  and  medications  and  dysphoric  mood  (r=  0.269,  P<  .001). 
Overall  depression  risk,  measured  by  the  data  from  this  study,  was  not  statistically 
significantly  related  to  the  participants'  risk  of  breast  cancer,  as  measured  by  the  Gail 
risk  model  (12,  30). 
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The  construct  validity  of  this  depression  risk  score  was  evaluated,  in  part,  with 
the  use  of  the  social  and  demographic  factors  associated  with  clinically  diagnosable 
depressive  disorders  in  the  EGA  study  (17).  Table  2  shows  the  distribution  of  P-1 
participants  according  to  the  three  level  depression  risk  scale  on  seven  demographic 
variables,  which  approximate  those  associated  with  clinically  diagnosable  depression  in 
the  EGA  study  (17).  All  of  these  variables,  except  education,  showed  a  statistically 
significant  dose-response  relationship  to  the  depression  risk  scores  in  terms  of  the 
direction  and  intensity  of  the  association. 

CES’D  data 

Figure  la  shows  the  proportion  of  the  participants  in  each  depression  risk  group 
who  scored  above  the  clinical  cut-off  of  >16  on  the  GES-D  at  baseline  and  at  each  of 
the  follow-up  examinations.  A  consistent,  positive  dose-response  relationship  exists 
between  depression  risk,  as  determined  on  the  basis  of  the  of  medical  history  items, 
and  the  proportion  of  participants  scoring  >16  on  the  GES-D  at  each  scheduled 
examination.  For  each  depression  risk  group.  Table  3  shows  the  mean  proportion  of 
follow-up  examinations  with  scores  >16,  and  the  distribution  of  the  maximum  and  the 
overall  scores  on  GES-D  examinations  above  the  clinical  cut-point.  A  positive  dose- 
response  relationship  was  also  observed  between  depression  risk  group  and  proportion 
of  respondents  who  scored  >16  on  sequential  GES-D  examinations.  In  the  high-risk 
depression  group,  for  example,  21.2%  of  the  respondents  scored  >16  on  three  or  more 
sequential  GES-D  examinations,  compared  with  9.7%  for  the  medium-risk  and  3.5%  for 
the  low-risk  groups.  These  findings  confirm  the  expectation  that  participants  in  the 
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higher  depression  risk  groups  (low>medium>high),  on  average,  tend  to  experience 
more  persistent  and  severe  episodes  of  affective  distress. 

We  next  analyzed  the  CES-D  data  from  each  depression  risk  group  by  treatment 
group  (tamoxifen  versus  placebo)  (Figures  1b-d  and  Table  4).  After  adjusting  for 
examination  and  risk  group,  the  results  of  a  logistic  regression  found  that  there  was  a 
statistically  non-significant  effect  for  the  tamoxifen  group  compared  with  the  placebo 
group  (OR=0.98;  95%  Cl  =  0.93  to  1.02;  P=  .32).  These  analyses  indicate  that 
treatment  group  is  not  statistically  associated  with  the  proportion  of  women  scoring 
above  the  CES-D  clinical  cut-off  of  >16  in  any  of  the  three  depression  risk  groups. 
Furthermore,  after  adjusting  for  depression  risk  group,  an  analysis  of  variance  found 
that  there  was  no  difference  in  the  mean  individual  proportion  of  follow-up  examinations 
above  the  clinical  cut-off  in  each  treatment  arm. 

The  Kaplan-Meier  plot  in  Figure  2  shows  the  relationship  between  assigned 
treatment  (placebo/tamoxifen)  and  depression  risk  group  (high/medium/low)  for  the  time 
from  randomization  until  the  first  CES-D  examination  with  a  score  exceeding  the  clinical 
cut-off  of  >16.  The  results  of  Cox  proportional  hazards  regression  analysis  with  these 
data  were  statistically  significant  for  depression  risk  group  ( likelihood  ratio  statistic  P  < 
.001;  hazard  ratio=1 .88  95%  Cl:  2.05-1 .74),  but  statistically  nonsignificant  for  both 
treatment  arm  (LRS  P=  .988,  HR=1.00  95%CI:  1.09-0.92)  and  interaction  effects  (LRS 
P  =  .575,  HR=1.03,  95%  Cl=  1.16-0.92).  The  proportional  hazards  assumption  for  this 
analysis  was  confirmed. 

Missing  data 

We  next  assessed  the  association  between  missing  data  and  depression  risk 
group  or  sequential  CES-D  examination  (Figure  3a).  Logistic  regression  analysis  based 
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on  the  data  in  Figure  3a  indicated  that  depression  risk  group  (OR=1 .17;  95%  Cl  =  1 .13 
to  1.21;  P  <.001 )  and  sequential  examination  (OR=1.45;  95%  Cl  =  1.44  to  1.46;  P  < 

.001)  were  both  statistically  significantly  associated  with  missing  CES-D  data.  Figures 
3b-d  show  the  proportion  of  participants  completing  the  CED-D  by  depression  risk  and 
treatment  groups.  Logistic  regression  analysis  by  depression  risk,  controlling  for 
sequential  examination,  indicates  that,  compared  with  those  receiving  the  placebo, 
tamoxifen  treatment  was  associated  with  higher  proportions  of  missing  data  in  the  low- 
risk  (OR=1.11;  95%  Cl  =  1.06  to  1.16;  P<  .001)  and  the  medium-risk  groups  (OR=1.12; 
95%  Cl  =  1.04  to  1.21;  P  <  .001),  but  not  in  the  high-risk  group  (OR=0.99;  95%  Cl  = 

0.84  to  1 .16;  P  =  .91).  If  tamoxifen-associated  depression  was  the  primary  cause  of 
these  missing  data,  we  would  have  predicted  a  positive  (dose-response)  increase  in  the 
magnitude  of  the  odds  ratios  from  the  lowest  to  the  highest  depression  risk  group. 

We  noted  in  our  previous  paper  (15)  that  it  was  difficult  to  continue  to  collect 
quality  of  life  data  after  a  participant  had  gone  off  treatment.  However,  participants  in 
the  P-1  study  were  asked  about  their  primary  reason  for  going  off  treatment  and  their 
responses  were  recorded  on  an  Off  Therapy  Form  (OTF)  that  included  "depression"  as 
one  of  10  specific  response  categories. 

Of  the  1 1 ,064  participants  in  this  cohort,  we  collected  an  OTF  for  3,539  (80.7%) 
of  4,382  women  who  missed  at  least  one  CES-D  examination.  The  presence  of  an  OTF 
showed  a  moderate  positive  correlation  with  the  total  number  of  missing  CES-D 
examinations  (a=0.62;  P  <  .001).  The  women  who  completed  an  OTF  accounted  for 
12,693  (89.7%)  of  14,149  missing  CES-D  examinations.  Only  110  (3.1%)  of  these 
women  reported  that  depression  was  the  primary  reason  for  going  off  therapy.  The 
most  frequent  reasons  for  going  off  therapy  were  non-medical  in  nature  (1667  women, 
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[47.1%]),  perceived  toxicities  (921  women,  [26.0%]),  and  various  protocol  and  non¬ 
protocol  medical  conditions  (841  women,  [23.8%]).  Table  5  shows  the  distribution  of 
women  who  reported  that  depression  was  their  primary  reason  for  going  off  treatment 
by  treatment  group  and  depression  risk  group.  An  analysis  of  these  data  using  binary 
logistic  regression  found  a  statistically  significant  effect  for  depression  risk  group 
(OR=2.37;  95%  Cl  =  1 .83  to  3.07;  P  <  .001)  and  a  statistically  nonsignificant  effect  for 
treatment  group  (OR=1.10;  95%  Cl  =  0.75  to  1.62;  P=  .63),  indicating  that  the  cases  of 
depression  that  lead  women  to  quit  their  assigned  treatment  did  not  occur  with  a  greater 
frequency  in  those  in  the  tamoxifen  arm. 

DISCUSSION 

Tamoxifen  is  the  most  widely  prescribed  anti-cancer  agent  currently  in  use.  It 
has  been  proven  to  be  effective  against  breast  cancer  as  an  adjuvant  treatment  and  in  a 
preventative  setting  (12,  31).  Given  the  widespread  use  of  tamoxifen,  it  is  important  to 
fully  investigate  all  of  the  potential  side-effects  that  may  be  associated  with  its 
administration,  so  that  women,  together  with  their  physicians,  can  make  an  informed 
decision  regarding  its  potential  costs  and  benefits  and  its  appropriateness  for  their 
individual  situations. 

This  study  is  an  extension  of  our  earlier  report  (16)  on  the  HRQL  data  from  the 
NSABP  P-1  study.  Previously,  we  found  no  evidence  for  an  association  between 
tamoxifen  treatment  and  depression  in  the  overall  P-1  study  cohort.  In  this  study,  we 
recognized  that  vulnerability  to  clinically  identifiable  depressive  disorders  is  not 
randomly  distributed  in  the  general  female  population  and  that  the  effects  of  tamoxifen 
on  susceptible  women  in  the  P-1  study  may  have  previously  gone  undetected. 
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Our  initial  problem  was  the  a  priori  identification  of  subgroups  of  women  with  a 
potential  clinical  susceptibility  for  depression.  Because  the  self-administered 
depression  screening  form  (CES-D)  used  in  the  P-1  study  provides  information  on 
short-term  symptoms  of  affective  distress  and  is  not  intended  for  use  as  a  diagnostic 
instrument  (21),  we  incorporated  the  participants'  self-reported  medical  history  of 
depression,  use  of  prescription  anti-depressive  medications,  and  experience  of 
extended  periods  (greater  than  12  months)  of  dysphoric  mood  to  assign  clinical  risk. 
Based  on  these  data,  women  were  prospectively  assigned  to  one  of  three  depression 
risk  groups.  We  hypothesized  that  the  higher  a  women's  depression  group,  the  greater 
the  likelihood  that  she  would  experience  a  clinically  diagnosable  episode  of  depression. 

The  P-1  staff  were  trained  to  check  the  consistency  and  appropriateness  of  the 
self-reported  data  about  prior  treatment  for  depression  and  the  use  of  anti-depressant 
medications  as  a  routine  part  of  the  medical  screening  procedure  carried  out  during 
entry/eligibility  interview.  These  procedures  were  designed  specifically  to  minimize 
false  positive  classification  errors.  However,  there  was  little  that  the  interviewer  could 
do  to  detect  false  negative  classification  errors  in  which  a  potential  participant  did  not, 
for  whatever  reason,  report  the  requested  screening  information.  The  overall  effect  of 
this  inability  to  control  for  false  negative  classification  errors  for  the  current  study  was  to 
create  a  potential  misclassification  bias  in  which  women  at  increased  risk  for  depression 
may  have  been  placed,  at  an  unknown  rate,  in  one  of  the  lower  risk  groups.  Although 
less  than  ideal,  the  effect  of  this  bias  is  conservative  in  nature,  operating  to  maintain  the 
comparative  validity  of  the  most  important  high-risk  depression  group. 

We  found  a  statistically  significant  dose-response  relationship  between  the  level 
of  the  depression  risk  group  (high>medium>low)  and  the  proportion  of  the  women  in 
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each  depression  risk  group  who  scored  above  the  clinical  cut-off  of  >16  on  the  CES-D 
at  baseline  and  at  every  follow-up  interview.  In  addition,  women  in  the  higher  risk 
groups  (high>medium>low)  scored  above  the  clinical  cut-off  on  a  greater  proportion  of 
their  follow-up  interviews  and,  on  average,  had  higher  maximum  CES-D  scores. 
Together,  these  data  suggest  that  there  was  a  dose-response  effect  in  which  women  in 
the  higher  depression  risk  groups  (high>medium>low)  were  more  likely  to  experience  a 
clinically  significant  episode  of  affective  distress  and  that  these  episodes,  on  average, 
were  more  persistent  and  severe  than  the  episodes  in  the  lower  risk  groups.  Finally,  we 
found  that  the  distribution  of  social  and  demographic  correlates  (i.e.,  age,  marital  and 
employment  status,  education,  use  of  medical  services)  across  the  three  depression 
risk  groups  defined  in  this  study  followed  the  same  general  patterns  of  risk  previously 
identified  in  the  EGA  study  of  depression  among  the  general  population  (17).  All  of  the 
above  findings  serve  to  support  the  validity  of  the  risk  assignments  used  in  our  study. 

The  primary  test  of  our  research  question  involved  stratifying  each  depression 
risk  group  by  treatment  assignment  (tamoxifen  versus  placebo)  and  comparing  the 
corresponding  proportions  of  women  at  each  follow-up  interview  who  scored  above  the 
clinical  cut-off  >1 6  on  the  CES-D.  We  found  no  effect  of  tamoxifen  for  any  of  the  three 
depression  risk  groups. 

Besides  the  lack  of  a  positive  association  between  tamoxifen  use  and 
depression,  there  are  at  least  two  possible  alternative  explanations  for  our  negative 
findings:  lack  of  statistical  power  and  missing  data.  We  carried  out  a  post-hoc  effect 
size  analysis  to  determine  the  size  of  the  difference  between  the  treatment  arms  that 
might  have  been  detected.  For  our  highest  risk  depression  group  (n=519),  we  had  an 
80%  chance  of  detecting  at  least  a  37%  (OR  >  1.37)  increase  between  the  two  study 
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arms  in  the  proportions  of  women  scoring  above  the  CES-D  clinical  cut-off  of  >16  at  any 
single  examination  point.  When  a  repeated  measures  design  was  used,  we  had 
sufficient  power  to  detect  a  mean  increase  of  24%  (OR  >  1 .24)  in  the  proportion  of 
women  in  either  arm  scoring  above  the  CES-D  clinical  cut-off  (32-33).  We  considered 
these  to  be  acceptable  levels  of  statistical  power  for  the  identification  of  clinically 
significant  treatment  effects  in  our  high-risk  depression  group.  The  detectable  odd 
ratios  were,  of  course,  even  smaller  for  the  low-  and  medium-risk  depression  groups. 

We  also  assessed  the  contribution  of  missing  data  to  explain  the  negative 
association  between  tamoxifen  and  depression  in  the  P-1  study.  An  initial  analysis 
showed  that  assigned  depression  risk  was  statistically  significantly  associated  with 
missing  data  rates  over  the  course  of  the  study.  If  a  "tamoxifen-associated"  depression 
was  the  primary  cause  of  these  rates,  we  would  have  predicted  that  the  tamoxifen 
treatment  group  in  the  higher  depression  risk  groups  would  show  a  progressively 
greater  differential  off-treatment  rate  compared  with  the  placebo  group.  This 
expectation  was  not  confirmed  by  our  data  for  the  high-risk  depression  group. 

In  addition,  we  also  examined  the  reasons  given  for  going  off  the  assigned 
treatment.  There  was  a  strong  statistical  association  in  the  P-1  study  between  stopping 
assigned  treatment  and  missing  HRQL  data  (16).  An  analysis  of  the  reasons  for  going 
off  treatment  in  81%  of  the  women  with  missing  HRQL  data  resulted  in  the  following 
observations:  (a)  depression  was  cited  as  a  relatively  infrequent  reason  for  going  off 
treatment:  (b)  the  higher  the  depression  risk  group,  the  greater  the  likelihood  that 
depression  was  cited  as  the  reason  for  going  off  treatment;  and,  (c)  within  each 
depression  risk  group,  depression  was  cited  as  the  reason  for  going  off  treatment  by 
similar  proportions  of  women,  regardless  of  treatment  assignment.  A  separate  paper 
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(34)  has  implemented  a  sensitivity  analysis  on  these  data  with  equally  negative  results. 
The  findings  in  this  report  together  with  this  sensitivity  analysis  ,  indicates  that  there  are 
no  clear  patterns  in  the  missing  data  that  serve  to  undermine  the  conclusions  drawn 
from  our  primary  analysis. 

The  results  of  our  analysis  strengthen  our  previous  conclusion  regarding  lack  of 
evidence  for  an  association  between  tamoxifen  use  and  depression  in  the  P-1  data  by 
provisionally  extending  our  findings  to  subgroups  of  women  at  a  high-risk  for  clinically 
identifiable  episodes  of  depression.  Clinically,  these  findings  have  two  major 
implications.  First,  the  evidence  from  NSABP's  P-1  study  does  not  lend  support  to  the 
idea  that  that  tamoxifen  should  be  considered  a  causal  risk  factor  for  the  onset  of 
depressive  symptoms  and/or  the  prolongation  of  depressive  episodes  that  occur  among 
treated  women.  Second,  the  findings  of  this  study  suggest  that  physicians  need  not 
automatically  disqualify  women  as  candidates  for  tamoxifen  treatment  simply  because 
they  report  a  history  of  depressive  symptoms  or  prior  treatment  for  a  depressive 
disorder.  Nevertheless,  it  is  still  essential  that  physicians  carefully  screen  for  affective 
disorders  and  treat  or  refer  potential  cases  of  depression  encountered  in  routine  clinical 
practice. 

Finally,  there  are  two  important  limitations  on  these  conclusions  that  require 
discussion,  one  statistical  and  the  other  methodological.  Statistically,  it  was  the  large 
size  of  the  P-1  study  that  permitted  us  to  identify  and  carry  out  stratified  analyses  of 
groups  of  women  with  a  differential  risk  for  depression.  However,  we  also  noted  that 
there  were  limits  on  our  statistical  power  to  detect  an  increase  in  the  proportion  of 
women  reporting  clinically  significant  levels  of  depressive  symptoms  on  the  CES-D, 
particularly  in  the  high-risk  depression  group.  For  this  reason,  we  cannot  absolutely 


Draft  #11  8.10.01  JNCI#3 


18 


exclude  the  possibility  that  there  may  be  rare  cases  in  which  women  react  negatively  to 
tamoxifen  treatment  with  potentially  life-threatening  depressions.  Here  its  is  useful  to 
recall  that  data  on  neuro-mood  toxicities  were  collected  for  P-1  participants  and 
periodically  reviewed  as  part  of  the  routine  safety  monitoring  procedures.  Over  the  full 
course  of  the  P-1  study,  there  were  a  total  of  three  suicides,  one  from  the  placebo  and 
two  from  the  tamoxifen-treated  group,  and  there  were  no  statistically  significant 
differences  in  the  distribution  of  women  reporting  suicidal  ideation  across  the  two  trial 
arms. 

The  methodological  limitations  of  this  report  (i.e.,  the  lack  of  standardized 
psychiatric  diagnoses  and  missing  HRQL  data)  are  primarily  due  to  the  fact  that  the 
goals  of  this  study  were  secondary  to  the  main  clinical  objectives  that  determined  the 
design  of  the  P-1  study.  A  more  definitive  analysis  would  require  additional  data  from  a 
potentially  smaller,  yet  more  focused  study  in  which  an  investigation  of  the  relationship 
between  clinical  depression  and  tamoxifen  treatment  was  the  primary  scientific 
objective.  Such  a  study  would  have  to  have  the  following  minimum  features:  (a)  a 
double-blind,  placebo-controlled,  randomized  design;  (b)  participants  who  are  at  high 
risk  for  breast  cancer,  rather  than  breast  cancer  patients  (to  avoid  potential  confounding 
due  to  clinical  diagnosis  and  treatment):  (c)  participants  who  are  stratified  on  a  reliable 
measure  of  risk  for  affective  disorder  (e.g.,  lifetime  diagnosis,  Schedule  for  Affective 
Disorders  and  Schizophrenia  -  Lifetime  version);  (d)  periodic  administration,  in  whole  or 
in  part,  of  a  standardized  psychiatric  diagnostic  instrument  (e.g..  Diagnostic  Interview 
Schedule)  by  a  trained  interviewer;  and  (e)  continued  collection  of  the  psychiatric 
interview  data  even  if  the  participant  goes  off  the  assigned  treatment  for  any  reason, 
except  death  or  consent  withdrawal.  Whether  the  additional  information  obtained  from 
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such  a  study  would  justify  the  time  and  the  expense  involved  in  its  collection  is  a 
problematic  question  that  is  beyond  the  scope  of  this  report. 
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Figure  Legends 


Figure  1a-d:  Proportion  of  P-1  participants  Scoring  >16  on  the  CES-D  with  95% 
Confidence  Intervals  by  Depression  Risk  Groups  (Low,  Medium,  High)  Alone  (la)  and 
by  Depression  Risk  Group  and  Treatment  Assignment  (Placebo/Tamoxifen)  (Ib-d) 

Figure  2:  Kaplan-Meier  Plot  of  Time  from  Randomization  to  First  CES-D  >16  by 
Depression  Risk  Group  (Low,  Medium,  High)  and  Treatment  Assignment 
(Placebo/Tamoxifen) 

Figure  3a-d:  Proportion  of  P-1  Participants  Completing  the  QoL  Questionnaire  by 
Depression  Risk  Groups  (Low,  Medium,  High)  Alone  (3a)  and  by  Depression  Risk 
Group  and  Treatment  Assignment  (Placebo/Tamoxifen)  with  95%  Confidence  Intervals 
(3b-d) 
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Table  1. 


Distribution  of  Self-Reported  Risk  Factors  for  Clinical  Depressive  Disorders  at  Baseline 
Examination 


Risk  Group 

Prior  History 

Anti-Depressant 

Persistent 

Risk  Factor 

0-3  Risk 

(Items  Endorsed) 

of  Depression 

Medications 

Dysphoria 

Pattern 

Factors 

n 

% 

n 

% 

Low  (0) 

no 

no 

no 

7964 

71.99 

7964 

71.99 

no 

no 

yes 

621 

5.61 

Medium  (1) 

no 

yes 

no 

668 

6.04 

1628 

14.71 

yes 

no 

no 

339 

3.06 

no 

yes 

yes 

120 

1.08 

Medium  (2) 

yes 

no 

yes 

202 

1.83 

953 

8.61 

yes 

yes 

no 

631 

High  (3) 

yes 

yes 

yes 

519 

4.69 

519 

4.69 

Table  2 
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Distribution  of  P-1  Participants  on  EGA  Social  and  Demographic  Correlates  of  Clinically 
Diagnosed  Depressive  Disorders  by  Depression  Risk  Score 


Sociodemograpic  item 

Depression  Risk  Score* 

Low  (%)  Medium  (%)  High  (%) 

Odds 

Ratio+ 

95%  Confidence 
Interval  on  OR 

Martial  status; 
divorced  /separated 

11.1 

17.7 

23.5 

1.63 

1.50-1.98 

Employment  status: 
not  working 

4.4 

7.9 

12.2 

1.78 

1.58-2.01 

Visited  doctor  within 
last  3  months 

71.0 

76.4 

84.4 

1.39 

1.28-1.51 

Hospitalized  within 
last  5  years 

42.7 

48.6 

54.9 

1.27 

1.19-1.36 

Age; 

>60 

29.9 

27.4 

24.1 

0.87 

0.81-0.94 

Education: 

>  High  School 

66.6 

66.7 

70.0 

1.04 

0.97-1.12 

Income: 

>  Median 

46.1 

37.6 

31.5 

0.72 

0.67-0.77 

*  Depression  risk  groups  were  assigned  on  the  basis  of  the  participants'  response  to  three  medicai  history 
questions;  prior  history  of  depression,  use  of  anti-depressant  medication,  persistent  mood  disturbance 
(dysphoria).  Each  positive  answer  was  worth  one  point.  Participants'  with  a  score  of  0  were  assigned  to 
the  "iow-risk"  group,  those  with  a  score  of  1-2  to  the  medium-risk  group,  and  those  with  a  score  of  3  to  the 
high-risk  group. 

^  Odds  ratios  were  determined  by  binary  logistic  regression,  P  <  .001  for  all  groups  compared  with 
referent  groups,  except  for  education  where  P  =  .235 

[ 


Table  3 
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Distribution  of  Center  for  Epidemiological  Studies-Depression  (CES-D)  Scale  Variables 
for  P-1  Participants  who  Scored  Above  the  Clinical  Cut-off  of  >16  by  Depression  Risk 
Group 


CES-D  Variable 

Low 

Depression  Risk  Group' 
Medium 

k 

High 

%  follow-up  exams  >16^ 

mean: 
95%  Cl  for  mean: 

0.087 

0.083-0.091 

0.192 

0.181-0.203 

0.357 

0.325-0.389 

Maximum  score  >16''^ 

median: 

mean: 
95%  Cl  for  mean: 

22 

23.97 

23.66-24.28 

24 

25.61 

25.16-26.06 

27 

28.58 

27.62-29.54 

All  scores  >16^ 

median: 

mean: 
95%  Cl  for  mean: 

20 

21.52 

21.30-21.74 

21 

22.49 

22.17-22.81 

22 

23.74 

23.10-24.38 

*  Depression  risk  groups  were  assigned  on  the  basis  of  the  participants'  response  to  three 
medical  history  questions;  prior  history  of  depression,  use  of  anti-depressant  medication, 
persistent  mood  disturbance  (dysphoria).  Each  positive  answer  was  worth  one  point. 
Participants'  with  a  score  of  0  were  assigned  to  the  "low-risk"  group,  those  with  a  score  of  1-2 
to  the  medium-risk  group,  and  those  with  a  score  of  3  to  the  high-risk  group. 

^  For  ail  between  group  comparisons  using  Kruskal-Wallis  and  ANOVA:  P  <  .001 
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Table  4 

Comparison  (binary  logistic  regression)  of  the  Proportion  of  P-1  Participants  in  Each  Treatment  Group  (tamoxifen 
versus  placebo)  who  Scored  >16  on  the  Center  for  Epidemiological  Studies-Depression  (CES-D)  Scale  by 

Depression  Risk  Group  and  Sequential  Examination 
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Table  5 

Reasons  Cited  for  Going  Off  Treatment 
by  Depression  Risk*  and  Treatment  Group 
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1a  -  Depression  Risk  Groups 
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Treatment  Group  and  Depression  Risk 
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Abbreviated  Title:  Quality  of  Life  in  a  Breast  Cancer  Prevention  Trial 
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Abstract 


This  report  contains  a  brief  summary  of  the  health-related  quality  of  life  findings  for 
1 1 ,064  women  taking  part  in  the  National  Surgical  Adjuvant  Breast  and  Bowel  Project's 
P-1  trial.  Women  taking  part  in  this  trial  of  tamoxifen  versus  placebo  for  breast  cancer 
prevention  were  >  35  years  old  and  predominately  white,  well-educated,  and  middle- 
class,  with  a  strong  professional  and  technical  orientation.  Key  findings  included  a  lack 
of  difference  between  the  tamoxifen  and  placebo  arms  with  regard  to  depression, 
overall  physical  or  mental  quality  of  life,  or  weight  gain.  The  tamoxifen  arm  did  show 
consistent  increases  in  vasomotor  (hot  flashes)  and  gynecological  (vaginal  discharge) 
symptoms,  as  well  as  difficulties  in  certain  domains  of  sexual  functioning.  It  is 
concluded  that  an  informed  discussion  with  a  woman  considering  tamoxifen  therapy 
should  include  these  points  in  the  risk-benefit  discussion. 
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Introduction 


This  is  a  brief  summary  of  the  findings  from  the  health-related  quality  of  life 
(HRQL)  component  of  the  National  Surgical  Adjuvant  Breast  and  Bowel  Project's 
(NSABP)  P-1  trial,  a  mult-center,  double-blinded,  placebo  controlled  clinical  trial 
designed  to  evaluate  whether  5  years  of  tamoxifen  therapy  would  reduce  the  incidence 
of  invasive  breast  cancer  in  women  at  an  increased  risk  for  the  disease.  Detailed 
descriptions  of  the  rationale,  planning  and  design  of  the  of  the  P-1  study  and  its  HRQL 
component,  as  well  as  specific  instruments,  are  available  in  separate  reports.^'® 

Subjects  and  Instruments 

This  summary  focuses  on  the  baseline  HRQL  examination  and  the  first  36 
months  of  follow-up  data  on  1 1 ,064  women  recruited  over  the  first  24  months  of  the 
study.  The  P-1  HRQL  Questionnaire  was  composed  of  the  Center  for  Epidemiological 
Studies  -  Depression  Scale  (CES-D),  the  Medical  Outcomes  Study  (MOS)  Short  Form 
(SF-36),  the  MOS  sexual  functioning  scale,  and  a  symptom  checklist  (SCL).  The 
questionnaire  was  to  be  administered  to  all  participants  prior  to  randomization 
(baseline),  at  3  months  and  at  each  succeeding  6  month  examination. 

Results 

The  participants  in  the  P-1  study  were  predominately  white  (96%);  well-educated 
(65%  >  some  college),  married  (70%),  professional  and  technically  trained  (68.2%) 
women,  who  were  currently  employed  (64.9%)  and  reported  a  middle  to  upper  middle 
class  family  income  (median  $35-49,999). 

Figure  1  shows  the  overall  proportion  and  total  numbers  of  women  completing 
the  HRQL  questionnaire  at  each  examination.  It  provides  a  measure  of  comparative 
participant  adherence  with  regard  to  the  HRQL  questionnaire  in  the  two  trial  groups. 


3 


Draft  3  (fin)  05.30.01 


Analysis  of  sociodemographic  and  medical  variables  indicated  that  participants  failing  to 
complete  the  HRQL  questionnaire  in  each  group  were  similar  cohorts  of  women. 

Figure  2  shows  the  proportion  of  P-1  participants,  by  group  and  examination, 
scoring  above  the  most  frequently  used  clinical  cut-off  (>16)  on  the  CES-D.®'^  The 
youngest  age  group  (35-49  yrs.)  in  both  trial  groups  consistently  had  the  highest 
proportion  of  members  scoring  above  the  clinical  cut-off,  followed  by  the  50-59  yrs.  age 
group  .  Similar  findings  with  regard  to  the  relationship  between  the  two  trial  groups 
emerged  from  the  analysis  of  the  5-item  mental  health  subscale  on  the  MOS  SF-36  (not 
shown). 

The  SF-36  results  are  summarized  in  Figure  3  using  the  physical  and  mental 
component  scores  (PCS,  MCS).®  Mean  PCS  declines  across  the  age  groups.  On 
follow-up  examinations,  the  tamoxifen  group  was  consistently  lower  on  the  PCS  only  in 
the  50-59  yrs.  age  group  (one-sided  sign  test,  p=0.065);  however,  the  absolute 
differences  were  very  small,  approximating  1/10  of  a  standard  deviation.  No  consistent 
differences  emerged  on  the  MCS  between  the  two  trial  groups. 

Table  1  provides  information  on  the  proportion  of  women  in  the  tamoxifen  and 
placebo  groups  reporting  symptoms  on  the  SCL  at  least  once  during  the  period  that  the 
participants  were  on  treatment  -  i.e.,  the  period  excluding  baseline,  but  including  the  7 
follow-up  examinations.  The  5  symptoms  with  the  greatest  relative  difference  between 
the  two  trial  groups  are  given  for  each  age  group  and  the  10  symptoms  with  the  greatest 
relative  difference  are  presented  for  all  participants  combined. 

Figure  4  summarizes  the  information  from  the  5  items  on  the  MOS  sexual 
functioning  scale.  Plate  A  on  Figure  4  shows  that  a  greater  proportion  of  participants  in 
the  tamoxifen  as  compared  to  the  placebo  group  reported  being  sexually  active  during 
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the  6  months  prior  to  each  follow-up  examination.  Although  apparently  consistent,  the 
absolute  difference  was  small  (mean=0.78%)  and  the  findings  may  have  been  due  to 
chance.  Plates  B-E  show  that  a  small,  but  consistently  larger  percentage  of  participants 
in  the  Tamoxifen  group  reported  a  definite  or  serious  problem  in  three  of  the  four 
specific  domains  of  sexual  functioning  during  the  follow-up  period. 

Discussion 

The  cohort  of  women  taking  part  in  the  P-1  study  were  not  representative  of  the 
general  population.  They  were  predominately  white,  well-educated,  and  middle-class, 
with  a  strong  professional  and  technical  orientation.  The  initial  HRQL  findings 
presented  in  this  report  must  be  assessed  within  the  context  of  the  socioeconomic  and 
cultural  characteristics  of  the  P-1  study  cohort. 

Although  31 .5%  our  participants  were  did  not  complete  the  36  month  HRQL 
follow-up  examination,  we  have  shown  that  there  is  only  a  small  difference  in  the 
proportion  of  non-adherent  participants  in  the  tamoxifen  and  placebo  groups  and  that 
the  non-adherent  women  in  both  trial  groups  are  generally  similar  on  key  demographic, 
clinical  and  HRQL  variables.  Given  these  considerations,  it  seems  unlikely  that  that  a 
maximum  difference  of  3%  in  the  HRQL  follow-up  rates  between  the  two  groups  was 
sufficient  to  create  a  significant  bias  in  our  between  group  comparisons. 

Concern  has  been  expressed  regarding  the  possible  relationship  between 
tamoxifen  use  and  the  onset  of  depression.®'^^  Women  reporting  a  history  of  depressive 
episodes  or  a  history  of  treatment  for  nervous  or  mental  disorders  were  not  excluded 
from  the  trial.  If  tamoxifen  use  was  associated  with  the  onset  of  clinically  diagnosable 
depression,  we  would  have  expected  to  see  a  consistent  excess  of  individuals  scoring 
>16  on  the  CES-D  in  the  tamoxifen  group.  No  such  consistent  excess  was  observed. 


5 


Draft  3  (fin)  05.30.01 


The  MOS  SF-36  served  in  this  study  as  a  measure  of  overall  health-related 
quality  of  life.  We  presented  data  from  this  instrument  in  terms  of  two  high-level 
component  scores  (PCS  and  MCS),  neither  one  of  which  demonstrated  any  clinically 
significant  differences  between  the  tamoxifen  and  placebo  groups. 

The  first  signs  of  consistent  differences  between  the  tamoxifen  and  placebo 
groups  were  observed  in  the  symptom  checklist  (SCL).  The  differences  between  the 
trial  groups  tended  to  be  associated  with  the  types  of  vasomotor,  gynecological,  and 
sexual  functioning  symptoms  previously  reported  for  tamoxifen.^®’ 

The  data  from  the  MOS  sexual  functioning  scale  indicate  that  relatively  small 
(<4.0%),  but  consistent  differences  exist  between  the  two  groups  with  regard  to  the 
proportion  of  women  reporting  definite  or  serious  problems  in  at  least  three  specific 
domains  of  sexual  functioning  -  sexual  interest,  arousal,  and  orgasm.  These  problems 
do  not  appear  to  be  age  group  specific.  Despite  these  findings  for  specific  domains  of 
functioning,  there  is  no  evidence  that  these  problems  result  in  a  reduction  in  the  overall 
proportion  of  women  in  the  tamoxifen  group  who  are  sexually  active. 

Based  on  these  data,  we  would  conclude  that  tamoxifen  use  is  associated  with 
an  increase  in  specific  vasomotor,  gynecological,  and  sexual  functioning  symptoms.  At 
the  same  time,  we  did  not  observe  any  evidence  that  overall  physical  or  emotional  well 
being  were  significantly  affected  by  these  differences  in  the  frequency  of  symptoms. 

We  also  found  no  evidence  on  the  CES-D  or  the  SF-36  mental  health  scale  for  an 
association  in  any  age  group  between  tamoxifen  use  and  an  increase  in  the  proportion 
of  women  reporting  clinically  significant  levels  of  depression. 

How  should  clinicians  integrate  these  research  results  into  decision-making  and 
recommendations  to  women  considering  the  use  of  tamoxifen  in  the  setting  of 
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prevention?  Many  symptoms  experienced  by  women  who  participated  in  this  study  are 
age  and  menopause  related,  and  exist  independent  of  the  use  of  tamoxifen.  However, 
several  symptoms  are  substantially  more  frequent  in  women  using  tamoxifen  and  these 
include  vasomotor  symptoms  (cold  sweats,  night  sweats,  hot  flashes),  vaginal 
discharge,  and  genital  itching.  Women  need  to  be  informed  of  these  possible 
symptoms.  Weight  gain  and  depression,  two  clinical  problems  anecdotally  associated 
with  tamoxifen  treatment  in  women  with  breast  cancer,  were  not  increased  in  frequency 
in  this  large  placebo-controlled  trial  in  healthy  women.  This  is  good  news  that  must  also 
be  communicated  to  women. 

An  informed  discussion  with  a  woman  considering  tamoxifen  therapy  should 
include  these  points  in  the  risk-benefit  discussion.  Disclosure  of  likely  and  unlikely 
symptoms  should  prepare  a  woman  for  what  she  might  experience,  and  reduce  her 
anxiety  or  concerns  should  she  embark  on  preventive  therapy.  Should  a  woman 
experience  untoward  symptoms  after  starting  tamoxifen  treatment,  the  medication  can 
be  discontinued  if  the  symptoms  cannot  be  controlled  or  her  personal  assessment  of  the 
risks  and  benefits  changes. 
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Table  1 

Symptoms  Reported  At  Least  Once  Between  Months  3  and  36 
With  The  Largest  Relative  Difference  Between  Trial  Arms 


Age  group  and  Symptom 

Placebo 

Arm 

Proportion  (%) 

Tamoxifen 

Arm 

Proportion  (%) 

Relative  Risk 
(TAM/Placebo) 

35-49  yrs 

1.  Cold  sweats 

15.90 

22.90 

1.44 

2.  Vaginal  discharge 

46.29 

62.55 

1.35 

3.  Pain  in  intercourse 

23.88 

31.57 

1.32 

4.  Night  sweats 

59.58 

74.16 

1.24 

5.  Hot  flashes 

65.54 

81.28 

1.24 

50-59  yrs 

1.  Cold  sweats 

16.11 

27.00 

1.68 

2.  Vaginal  discharge 

32.51 

53.47 

1.64 

3.  Genital  itching 

36.93 

45.24 

1.23 

4.  Night  sweats 

62.77 

75.88 

1.21 

5.  Bladder  control  (laugh) 

47.67 

56.94 

1.19 

>60  yrs 

1.  Vaginal  bleeding 

4.64 

10.92 

2.35 

2.  Vaginal  discharge 

19.82 

45.81 

2.31 

3.  Genital  itching 

32.05 

40.96 

1.28 

4.  Hot  flashes 

51.51 

63.59 

1.23 

5.  Bladder  control 
(laugh) 

Overall 

49.88 

56.49 

1.13 

1.  Vaginal  discharge 

34.13 

54.77 

1.60 

2.  Cold  sweats 

14.77 

21.40 

1.45 

3.  Genital  itching 

38.29 

47.13 

1.23 

4.  Night  sweats 

54.92 

66.80 

1.22 

5.  Hot  flashes 

65.04 

77.66 

1.19 

6.  Pain  in  intercourse 

24.13 

28.19 

1.17 

7.  Bladder  control  (laugh) 

46.65 

52.51 

1.13 

8.  Bladder  control  (other) 

47.79 

52.83 

1.11 

9.  Weightless 

41.97 

44.94 

1.07 

10. Vaginal  bleeding 

21.26 

21.96 

1.03 
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Proportion  Completing  QoL 


Figure  1 

Proportion  of  Participants  in  the  Tamoxifen  and  Placebo  Groups 
Completing  QoL  Questionnaire  by  Examination^ 

(N  placebo  =  5537/  TAM  =  5527) 


Examination 

1.  Figures  on  chart  are  the  number  of  women  in  the  placebo/TAM  groups 
completing  the  QoL  questionnaire  and  the  difference  between 
TAM  and  placebo  groups  in  terms  of  percent  missing  QoL  data. 
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Proportion  Participants  at  Case  Level 


Figure  2 

Proportion  of  P-1  Participants  with  CES-D  Scores 
at  the  Level  of  a  Potential  Case  (>16)  by  Arm  and  Examination 
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SF-36  PC  Score  SF-36  PC  Score  SF-36  PC  Score 


Figure  3 

Mean  Scores  by  Age  Group  and  Examination  on 
Sf-36  Physicai  and  Mental  Component  Scores 
(higher  scores  represent  better  quaiity  of  life) 


Physical  Component  Scores 


Mental  Component  Scores 


35-49  Years 


35-49  Years 


50-59  Years 


60+  Years 


60+  Years 
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Proportion  Definite  or  Serious  Probiem  Proportion  Definite  or  Serious  Problem 


Figure  4 

Proportion  of  Women  in  the  Tamoxifen  and  Placebo  Arms 
Reporting  a  Definite  or  Serious  Problem  in  Past  4  Weeks 
on  MOS  Sexual  Functioning  Scale 
(Plates  B.-E.  refer  only  to  women  who  reported 
being  sexually  active  in  last  6  months) 


A.  Sexually  Active  Last  Six  Months 


B.  Lack  of  Sexual  Interest 


D.  Unable  to  Relax  and  Enjoy  Sex 


C.  Difficulty  Becoming  Sexually  Aroused 
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METHODOLOGICAL  ISSUES  IN  THE  ANALYSIS  OF  QUALITY  OF  LIFE  DATA 
IN  CLINICAL  TRIALS:  ILLUSTRATIONS  FROM  THE  NATIONAL 
SURGICAL  ADJUVANT  BREAST  AND  BOWEL  PROJECT  (NSABP) 
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JOSEPH  P.  COSTANTINO',  WEI  LANG^  AND  PATRICIA  A.  GANZ^ 

[1]  University  of  Pittsburgh  and  the  National  Surgical  Breast  and  Bowel  Project,  [2]  University  of 
Pennsylvania  School  of  Medicine,  [3]  Wake  Forest  University  School  of  Medicine,  [4]  UCLA 
Schools  of  Medicine  and  Public  Health  and  the  Jonsson  Comprehensive  Cancer  Center 

We  present  two  Quality  of  Life  (QOL)  endpoints  collected  in  conjunction  with  the  recently  completed 
Breast  Cancer  Prevention  Trial  (BCPT)  performed  by  the  National  Surgical  Adjuvant  Breast  and 
Bowel  Project  The  analyses  of  these  endpoints  (depression  and  hot  flashes)  indicate  the  importance 
of  randomization  and  give  some  insight  about  the  impact  of  missing  data  in  a  large  randomized 
trial. 

1.  Introduction 

Quality  of  life  (QOL)  assessments  have  been  increasingly  included  as  secondary  or  primary 
endpoints  in  clinical  trials  (Tannock  et  al,  1996;  Moinpour  et  al,  1998).  The  impetus  for  doing 
so  comes  from  a  desire  to  obtain  patient-rated  evaluations  of  treatments,  especially  in 
circumstances  in  which  treatments  have  substantially  differing  toxicities  or  in  which  survival 
outcomes  are  not  expected  to  be  different  (Ganz,  1994  a  and  b).  Under  such  circumstances,  an 
evaluation  of  the  morbidity  of  treatment  from  the  patient’s  or  participant’s  perspective  may  in  fact 
be  the  most  important  endpoint.  Although  there  is  now  a  wide  range  of  psychometrically 
validated  scales  for  the  measurement  of  QOL  in  clinical  trials  (Celia  and  Bonomi,  1995),  there  are 
considerable  challenges  to  the  implementation  and  collection  of  QOL  data  in  these  studies 
(Bernhard  et  al,  1998a),  as  well  as  equally  formidable  statistical  and  analytical  concerns 
(Bernhard  and  Gelber,  1998b).  In  this  paper,  we  provide  examples  from  the  recently  completed 
NSABP  Breast  Cancer  Prevention  Trial  (BCPT)  to  highlight  challenges  that  can  arise  in  the 
analyses  of  such  data,  specifically  focusing  on  the  importance  of  randomization  and  the  issue  of 
missing  data  and  its  potential  to  affect  the  interpretation  of  QOL  outcomes. 

2.  Background 

The  BCPT  was  a  double-blinded,  placebo-controlled  clinical  trial  that  was  open  for  accrual  from 
June  1,  1992,  through  September  30,  1997.  During  this  interval  13,338  women  at  high  risk  for 
breast  cancer  were  randomly  assigned  to  receive  either  20  mg/day  of  tamoxifen  or  placebo  for  a 
duration  of  five  years.  The  primary  objective  of  the  trial  was  to  determine  if  tamoxifen  therapy 
would  reduce  the  risk  of  breast  cancer  among  women.  Secondary  objectives  related  to  the  frill 
benefit/risk  profile  of  tamoxifen  use  in  healthy  women.  Participants  in  the  trial  were  screened  for 
breast  cancer  at  six-month  intervals  by  clinical  breast  examination  and  at  yearly  intervals  by 
bilateral  mammography.  At  each  screening,  visit  participants  were  also  evaluated  for  several  other 
endpoints  including  heart  disease,  fractures,  thromboembolic  disease,  and  endometrial  cancer. 
Heart  disease  and  fractures  were  included  because  it  was  theorized  that  tamoxifen  might  also 
reduce  the  risk  of  these  problems.  Thromboembolic  disease  and  endometrial  cancer  were  included 


because  these  were  known  side  effects  associated  with  tamoxifen  therapy.  As  an  additional 
means  to  monitor  the  safety  of  treatment  in  the  trial,  the  incidence  of  all  invasive  cancers  and  the 
occurrence  of  all  deaths  were  also  included  as  endpoints. 

The  results  of  the  BCPT  have  been  reported  (Fisher  et  al.,  1998),  as  has  a  study  of  the 
risk-benefit  ratio  for  tamoxifen  (Gail  et  al,  1999).  During  follow-up,  175  participants  randomized 
to  receive  placebo  developed  invasive  breast  cancer  compared  to  only  89  randomized  to  receive 
tamoxifen,  indicating  an  estimated  50%  reduction  in  the  risk  of  breast  cancer  from  the  use  of 
tamoxifen.  Other  major  findings  include  the  detection  of  a  preventive  effect  on  osteoporotic 
fractures,  no  effect  on  heart  disease,  and  a  confirmation  of  the  knovm  side  effects  of  endometrial 
cancer  and  thromboembolic  disease.  These  overall  findings  will  not  be  discussed  in  this 
manuscript,  as  we  wish  to  focus  on  issues  that  relate  to  QOL  studies. 

Because  the  participants  in  this  trial  were  healthy  women,  the  monitoring  of  their  QOL 
during  the  intervention  was  of  particular  importance.  Thus,  the  NSABP  included  a  concurrent 
QOL  study  designed  to  describe  side  effects  of  tamoxifen,  to  examine  the  relationship  between 
side  effects  and  QOL,  to  compare  the  side  effects  and  QOL  in  placebo  and  treated  subjects,  and  to 
examine  the  effects  of  symptoms  on  compliance  with  study  medication.  The  BCPT  QOL 
questionnaire  was  a  104-item  battery  that  included  four  instruments:  the  Center  for  Epidemiologic 
Studies  Depression  Scale  (CES-D)  (Radloff,  1997);  the  Medical  Outcomes  Study  36-Item  Short 
Form  (SF-36)  (Ware  et  ai,  1994);  a  symptom  checklist  based  on  the  Postmenopausal  Estrogen 
Progesterone  Intervention  (Shumaker  S.,  personal  communication)  specifically  adapted  for  the 
BCPT  trial;  and  the  Medical  Outcomes  Study  Sexual  Problems  Scale  (Sherboume,  1992).  These 
instruments  were  selected  because  of  their  psychometric  characteristics  and  validity,  the 
availability  of  normative  data  in  healthy  women,  and  ease  of  self-administration.  The  latter  was 
particularly  important  because  the  trial  was  conducted  at  several  hundred  clinical  centers 
throughout  North  America  and  the  battery  of  questions  we  asked  was  completed  on  multiple 
occasions  in  conjunction  with  study  visits.  The  QOL  assessment  was  scheduled  to  occur  at 
baseline  before  administration  of  the  study  medication  and  at  every  clinical  visit  during  the  five 
years  after  randomization  (at  three  months,  at  six  months,  and  every  six  months  thereafter). 
However,  the  trial  was  unblinded  on  March  31,  1998,  following  an  interim  analysis  that  showed  a 
dramatic  reduction  in  the  incidence  of  breast  cancer  among  the  participants  who  received 
tamoxifen.  The  QOL  follow-up  was  terminated  at  that  time  due  to  the  potential  loss  of  the  control 
arm.  In  this  manuscript,  as  in  our  prior  analyses  of  QOL  data  from  the  trial  (Day  et  al.,  1999, 
Ganz  et  ai,  1998),  we  use  QOL  data  available  on  participants  who  were  recruited  to  the  trial 
during  the  first  two  years  of  the  study  (June  1,  1992  to  May  31, 1994)  as  all  of  these  women  would 
have  been  expected  to  have  36  months  of  completed  follow-up  data  at  the  time  the  study  was 
terminated.  The  sample  includes  11,064  women  who  represent  82.6%  of  the  total  accrual  to  the 
BCPT.  We  use  only  their  first  three  years  of  follow-up. 

3.  The  Effect  of  Tamoxifen  on  Depression 

When  the  BCPT  began,  there  was  considerable  concern  that  tamoxifen  therapy  might  be 
associated  with  the  development  of  depressed  mood  in  women  with  breast  cancer.  Although  Love 
et  al.  (1991)  did  not  find  such  an  effect  when  reporting  symptoms  associated  with  tamoxifen 
treatment  in  a  randomized  trial  in  postmenopausal  women  with  breast  cancer,  several  researchers 
subsequently  reported  results  suggesting  that  administration  of  tamoxifen  might  lead  to  depression 
in  some  breast  cancer  patients  (Cathacart  et  al.,  1993;  Shariff  et  al,  1995;  Moredo  et  al,  1994). 
The  latter  studies  were  relatively  small  (fewer  than  400  patients)  and  none  had  a  placebo 
comparison  group.  However,  there  was  a  potential  scientific  rationale  for  tamoxifen’s  association 
with  depression.  Estrogen  had  been  shown  to  have  a  beneficial  effect  on  mood  in  postmenopausal 


women  (Halbreich,  1997;  Gregoine,  et  al,  1996),  and  it  was  considered  plausible  that  tamoxifen 
might  negate  these  positive  effects  of  estrogen.  Thus,  careful  measurement  of  depression, 
including  a  screening  instrument  to  identify  potential  cases  of  depression,  was  important  in  the 
design  of  the  BCPT  QOL  study. 

The  primary  instrument  used  in  the  BCPT  to  study  the  change  in  depression  level  over 
time  was  the  CES-D,  a  self-administered  questionnaire  (20  questions)  that  screens  for  depressive 
symptoms  over  the  seven  days  prior  to  administration  (Radloff,  1977).  A  participant’s  score  is 
the  sum  of  the  responses  for  the  20  questions  and  can  range  from  0  (no  depressive  symptoms)  to 
60  (maximum  depressive  symptoms).  The  instrument  is  widely  used  because  it  is  easy  to 
administer  and  has  excellent  population-based  normative  data  (Myers  and  Weissman,  1980; 
Roberts  and  Vernon,  1983;  Boyd  et  aL,  1982).  To  assess  the  validity  of  the  CES-D  in  the  BCPT 
sample,  we  compared  the  baseline  CES-D  scores  of  BCPT  participants  with  ten  medical  history 
items  related  to  mental  health  that  had  been  obtained  at  entry  to  the  trial  (Table  1).  The  first  three 
items  were  obtained  in  the  context  of  questions  about  diagnosed  medical  problems,  although  we 
did  not  verify  that  there  had  been  a  recorded  diagnosis.  There  is  nearly  a  linear  relationship 
between  the  number  of  positives  from  the  participant’s  mental  health  history  and  the  CES-D  score 
(p<0,0001),  providing  considerable  reassurance  Aat  the  CES-D  score  from  this  study  sample  was 
highly  associated  with  a  clinical  mental  health  history.  Similarly,  the  association  between  the 
mean  CES-D  score  and  the  three  depression-related  items  "ever  had  depression,"  "ever  took 
antidepressants"  (either  item  4  or  6  positive),  or  "any  two  years  depressed  or  sad",  showed  an 
increasing  relationship  between  the  CES-D  score  and  the  number  of  positives  (p<.001).  In 
addition,  the  baseline  CES-D  scores  were  well  balanced  across  placebo  and  tamoxifen  treatment 
assignment  (Table  2).  Cut-off  points  used  in  the  table  are  somewhat  arbitrary,  although  a  cut-off 
of  16  is  commonly  used  as  die  minimum  for  classifying  a  person  as  depressed  (Myers  and 
Weissman,  1980;  Roberts  and  Vernon,  1983;  Boyd  et  aL,  1982)  and  Lyness  et  aL  (1997)  used  the 
cutoff  of  22  when  screening  for  major  depression. 


Table  1 

BCPT  Participant  History  Mental  Health  Items  Obtained  at  Entry  to  the  BCPT 


Item  %  Yes 

Ever  had  depression  1 5 

Ever  had  nervous  or  emotional  disorder  3 

Ever  had  psychiatric  problems  1 

Current  antidepressants  6 

Current  tranquilizers  1 6 

Previous  antidepressants  4 

Previous  tranquilizers  1 5 

Two  weeks  sad,  blue,  depressed,  disinterested  17 

Any  two  years  depressed  or  sad  9 

Depressed  or  sad  most  of  past  year _ 5^ 


Table  2.  Baseline  CES-D  Scores 


Score 


Placebo  Tamoxifen 


0-10 

85.5 

83.9 

11-15 

7.8 

9.0 

16-21 

3.8 

4.1 

22-60 

2.8 

3.0 

In  Figure  1,  we  present  the  mean  CES-D  scores  by  visit  and  treatment  arm  during  the 
BCPT.  The  observed  increase  of  depression  among  participants  receiving  tamoxifen  is  slightly 
less  than  the  observed  increase  among  participants  receiving  placebo,  although  the  difference  is 
not  significant  (p=0,24).  Thus,  the  increase  in  the  depression  score  during  the  first  six  months  of 
the  trial  does  not  appear  to  be  related  to  the  administration  of  tamoxifen.  It  is  noteworthy  that  the 
dramatic  increase  in  scores  at  months  3  and  6  would  almost  certainly  have  been  attributed  to 
tamoxifen  had  there  not  been  a  placebo  arm.  This  illustrates  the  danger  of  trying  to  establish  a 
cause-and-effect  relationship  in  a  non-randomized  setting. 

FIGURE  1  ABOUT  HERE. 

We  do  not  know  why  the  CES-D  depression  scores  increased  for  participants  on  both 
arms  of  the  study  (placebo  and  tamoxifen).  It  is  possible  that  symptoms  of  worry  and  depression 
increased  due  to  the  controversy  surrounding  this  trial,  the  fear  and  uncertainty  of  taking  either 
placebo  or  active  agent,  an  increased  awareness  of  breast  cancer  risk,  or  a  concern  over  potential 
therapy  side  effects.  Alternatively,  the  raised  scores  might  be  partially  attributed  to  “nocebo 
effect”  (Hahn,  1997):  if  an  individual  fears  or  believes  that  a  side  effect  may  occur  from  a 
medication,  he  or  she  will  report  it.  (As  will  be  shown  later,  participants  receiving  placebo  also 
reported  an  increase  in  hot  flashes,  but  not  at  the  same  significant  rate  as  the  participants  on 
tamoxifen.)  Since  neither  group  of  women  knew  which  pill  they  were  taking  they  may  have 
reported  increased  symptoms  because  they  feared  the  potential  medication  side  effects  described 
to  them  as  part  of  the  consent  process.  A  third  possibility  is  that  the  baseline  scores  were 
artificially  low  and  the  subsequent  increase  reflected  a  regression  to  the  mean.  We  do  not  believe 
the  baseline  scores  are  much  lower  than  would  be  expected  for  the  educated,  socioeconomically 
advantaged  population  in  the  trial.  However,  to  the  extent  that  the  scores  were  artificially  low,  it 
could  be  either  that  women  were  less  likely  to  enter  the  trial  when  they  were  experiencing 
depressive  symptoms,  or  that  they  would  under-report  for  fear  of  jeopardizing  their  inclusion  in 
the  trial.  In  any  case,  the  phenomenon  of  an  early  increase  in  depressive  symptoms  appears  to  be 
independent  of  tamoxifen  use. 

However,  we  were  concerned  that  there  might  be  a  treatment  effect  in  the  subset  of 
subjects  at  higher  risk  of  depression.  Because  93%  of  the  participants  had  baseline  CES-D  scores 
<16,  and  85%  had  scores  <11,  such  an  effect  might  not  be  apparent  in  an  analysis  based  on  the 
entire  population.  To  explore  this  possibility  we  divided  the  women  into  four  groups  of  risk:  zero, 
one  to  two,  three  to  five,  and  six  to  ten  "yes"  responses  to  the  mental  health  items  listed  in  Table  1. 
There  was  no  difference  observed  between  tamoxifen  and  placebo  participants  in  any  of  the  four 
groups.  Results  were  similar  when  the  baseline  CES-D  score  was  used  to  create  risk  groups  (CES- 
D  scores  from  0  to  11;  12  to  15;  16  to  21;  or  22  or  more).  There  was  a  suggestion  that  tamoxifen 
is  beneficial  in  the  high-risk  group  (p=0.04),  although  this  is  likely  to  be  a  statistical  artifact. 

The  problem  of  missing  data  is  common  in  clinical  trials  that  assess  QOL  (Bernhard  et 
ai,  1998a).  In  the  BCPT,  this  was  exacerbated  by  the  fact  that  the  clinical  centers  were  not 
required  to  collect  QOL  data  when  a  participant  went  off  the  study  medication.  As  will  be  seen, 
this  led  to  a  substantial  problem  of  non-random  missing  data.  Only  82  participants  did  not  fill  out 
the  CES-D  form  at  entry  (an  extremely  low  rate  of  missing  baseline  data),  and  these  participants 
were  excluded  from  subsequent  analyses.  (Questionnaires  that  were  partially  completed  are 


considered  missing  in  this  report.)  However,  of  the  possible  76,874  post-entry  forms  that  10,982 
remaining  participants  were  expected  to  submit  during  the  three-year  period,  13,752  (18%)  were 
missing.  At  the  end  of  the  third  year,  slightly  more  than  30%  were  missing  and  participants  who 
received  tamoxifen  were  more  likely  to  have  missing  data  (33%  versus  30%  missing,  p<0.001). 
The  first  three  rows  in  Table  3  present  the  number  and  percent  of  missing  forms  preceded  by  a 
protocol-specified  event  (such  as  second  primary  cancer,  deep-vein  thrombosis,  ischemic  heart 
disease,  or  death);  missing  forms  preceded  by  early  termination  of  therapy  for  a  reason  not 
specified  by  the  protocol;  and  missing  forms  preceded  by  consent  withdrawal  by  the  participant; 
the  fourth  row  of  this  table  shows  the  number  of  forms  that  were  missing  when  the  participant  was 
still  receiving  therapy.  Figure  2  displays  the  percent  of  missing  forms  in  four  groups  based  on 
baseline  CES-D  scores.  Participants  who  began  with  an  elevated  CES-D  score  were  more  likely 
to  have  missing  data  (p<0.001  at  three  years) . 

INSERT  FIGURE  2  AND  TABLE  3  ABOUT  HERE. 

The  average  of  the  CES-D  scores  immediately  preceding  a  missing  score  was  higher  than 
the  average  of  the  CES-D  scores  immediately  preceding  an  observed  score  (Table  4),  which  raised 
the  possibility  that  missing  scores  would  have  been  higher  than  concurrently  observed  scores.  The 
differences  were  almost  identical  in  the  tamoxifen  and  placebo  arms,  indicating  that  while  the 
missing  data  might  result  in  an  underestimate  of  depression,  the  bias  would  be  the  same  in  both 
arms.  When  we  considered  other  flmctions  of  preceding  scores,  we  found  that  none  had  a  stronger 
association  with  missing  scores  than  did  the  immediately  preceding  score.  In  particular,  the  slopes 
between  two  scores  preceding  a  missing  score  were  not  significantly  different  fi-om  slopes 
between  two  scores  preceding  an  observed  score.  Therefore  we  considered  some  simple 
imputations  based  on  the  scores  immediately  preceding  the  missing  scores. 

INSERT  TABLE  4  ABOUT  HERE. 


In  discussing  the  imputation  methods,  we  will  use  the  following  notation.  The  baseline 
and  seven  post-entry  CES-D  scores  for  the  yth  individual  participant  will  be  represented  by  the 

vector  —{x^j^x^j^X2j:>x-^j^X^j^x^j^x^j^x^j),  where  "missing”  is  a  possible  value  for  the 
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CES-D  score.  Let  X-  {X-  )  be  the  average  CES-D  among  tamoxifen  (placebo)  participants  with 
an  observed  CES-D  score  at  the  ith  visit.  We  define  a  new  set  of  vectors  by 
xj  =iXoi,xli,xl,Xy,xl,Xsi,x'^i,Xy),  where  Xy=Xy  if  Xy  is  observed.  If  Xy  is 

j  j  _ j’  _ j’  f  j  _ p  _ p 

missmg,Xy  =  x^j^^y  +  Xi  for  a  tamoxifen  participant  and  +  for  ^ 

placebo  participant,  where  the  imputation  begins  with  xl^  then  X2i  ^o  forth.  The  mean  CES- 
D  curves  are  slightly  higher  than  in  Figure  1  (where  no  imputation  is  involved),  but  the  differences 
between  the  two  curves  remain  nearly  identical  to  the  differences  seen  in  Figure  1 . 

Although  Table  4  suggests  that  the  imputed  values  defined  above  would  be  appropriate  for 
replacing  missing  values,  we  cannot  rule  out  the  possibility  that  the  missing  values  mask  a  greater 
increase  in  depression  for  tamoxifen  participants  than  for  placebo  participants.  For  example,  there 
might  have  been  a  subset  of  tamoxifen  participants  who  became  depressed  as  a  result  of  the 
treatment  and  dropped  out  before  this  effect  could  be  observed.  We  do  not  have  data  available  to 
verify  that  this  is  not  the  case.  In  order  to  see  just  how  great  a  differential  (by  treatment)  would 


have  been  required  to  change  the  interpretation  of  the  data,  we  performed  three  sensitivity 
analyses. 

For  the  first  sensitivity  analysis,  we  imputed  missing  values  as  defined  above,  but  for 
every  missing  value  of  a  tamoxifen  participant  we  added  0.5  units  to  the  imputed  value.  The 
resultant  mean  values  of  CES-D  at  each  assessment  are  almost  the  same  between  treatment  arms. 
This  is  somewhat  reassuring,  since  adding  .5  units  to  each  missing  CES-D  score  for  tamoxifen 
participants  and  none  for  placebo  participants  is  extreme.  As  Table  3  indicated,  the  status  of  the 
participants  with  missing  forms  was  similar  on  both  arms.  In  instances  in  which  institutions 
reported  the  reason  participants  went  off  study,  only  3%  reported  depression  as  the  reason  for 
doing  so. 

The  second  sensitivity  analysis  was  based  on  a  partitioning  of  missing  questionnaires  into 
those  that  were  missing  for  a  variety  of  non-treatment-related  reasons  and  those  that  were  missing 
for  treatment-related  reasons.  Specifically,  we  assumed  that  if  m  questionnaires  were  missing  (at 
a  particular  assessment  time)  in  the  placebo  arm,  and  m  +  X  questionnaires  were  missing  in  the 
tamoxifen  arm,  then  some  fraction  of  the  X  questionnaires  might  be  attributable  to  excess 
depression  caused  by  tamoxifen.  We  calculated  treatment  group  means  (at  each  assessment  time) 
as  if  some  fraction  r  (for  various  candidate  values  of  r )  of  the  missing  tamoxifen  scores  were 
replaced  with  the  mean  of  all  observations  at  that  assessment  that  were  at  least  16,  since  these 
represent  severe  depressive  symptoms.  The  remaining  missing  values  in  both  arms  were  replaced 
with  the  mean  of  all  observations  at  that  assessment.  At  r  =  1  /  2 ,  the  curves  of  imputed  CES-D  for 
the  two  treatment  groups  overlapped  [not  shown].  That  is,  there  did  not  appear  to  be  a  tamoxifen- 
related  increase  in  CES-D  unless  greater  than  half  of  the  excess  missing  questionnaires  were 
assumed  to  coincide  with  severe  depressive  symptoms. 

All  of  the  analyses  shown  above  were  also  carried  out  for  a  binary  outcome  of  severe 
depressive  symptoms,  defined  as  any  CES-D  score  >16.  In  Figure  3. A,  we  plot  the  proportion  of 
values  classified  as  a  "yes"  as  a  function  of  time  and  again  find  no  tamoxifen  effect.  Imputation  of 
the  missing  values  using  preceding  scores  had  minimal  impact  on  our  findings.  For  a  sensitivity 
analysis,  we  performed  the  imputation  with  the  additional  assumption  that  3.2%  of  the  missing 
tamoxifen  CES-D  forms  had  a  score  >  16,  even  though  the  prior  score  was  <16.  This  would  be 
roughly  equivalent  to  assuming  that  all  the  tamoxifen  participants  who  reported  depression  before 
dropping  out  of  the  study  subsequently  had  a  score  >16,  while  none  of  the  placebo  participants 
reporting  depression  before  dropping  out  had  a  score  exceeding  15.  The  sensitivity  analysis, 
presented  in  Figure  3.B,  indicates  that  under  this  fairly  extreme  assumption  about  the  drop-outs, 
the  two  curves  would  essentially  overlap. 

INSERT  FIGURES  3  A  AND  3B  HERE. 

As  a  final  step  in  the  sensitivity  analysis,  we  considered  a  model-based  method  that  adjusts  for 
drop-out  related  to  observed  and  unobserved  CES-D  outcomes  through  subject-level  random 
effects.  This  approach,  which  may  be  used  to  adjust  for  other  covariates,  has  been  presented 
previously  in  other  randomized  trial  contexts  for  continuous  data  (Schlucter,  1992;  DeGruttola  and 
Tu,  1994)  and  for  binary  data  (Ten  Have  et  ai,  1998),  and  in  a  cohort  study  context  for  ordinal 
data  (Ten  Have  et  al,  2000).  More  specifically,  we  fitted  an  ordinal  logistic  model  with  random 
effects  to  the  CES-D  outcome  data.  The  CES-D  score  was  categorized  as  in  Table  2.  The  models 
make  the  proportional  odds  assumption,  that  is,  the  odds  ratio  specified  for  a  given  cut-point  of  the 
ordinal  CES-D  scale  is  the  same  as  the  odds  ratio  specified  for  every  other  cut-point.  TTiis 
approach  is  not  designed  for  intermittent  missingness.  Therefore,  any  participant's  data 
subsequent  to  a  missing  form  was  deleted  for  the  purpose  of  this  analysis.  The  model  comprised 
three  components  consisting  of  different  covariate  effects  but  sharing  the  same  subject-level 
random  effect  structure.  The  first  was  an  ordinal  CES-D  outcome  component  with  treatment  arm 
and  time  (7  degrees  of  freedom)  as  main  effects,  and  their  interaction  (7  degrees  of  freedom).  The 


second  and  third  model  components  corresponded  to  separate  discrete  survival  time  logistic 
specifications  for  non-protocol  and  protocol  specified  drop-out.  Each  of  these  drop-out 
components  included  main  effect  covariates  corresponding  to  treatment  arm  and  time. 

We  present  results  based  on  two  versions  of  each  of  these  drop-out  components.  The  first 
version  includes  as  covariates  the  CES-D  outcome  before  drop-out  and  its  interaction  with 
treatment  arm  and  type  of  dropout  (protocol  vs.  non-protocol).  In  the  second  version,  each  of  these 
drop-out  components  excludes  the  CES-D  outcome  and  its  interactions.  The  ensuing  results  are 
based  on  these  model  specifications  without  baseline  covariates.  Including  baseline  age  did  not 
alter  the  results.  The  subject-level  random  effect  structure  shared  by  the  CES-D  and  drop-out 
components  induces  a  relationship  between  the  CES-D  observed  and  unobserved  outcomes  and 
the  risk  of  drop-out.  The  magnitude  of  this  relationship  is  characterized  by  the  specification  of 
separate  variance  components  of  the  random  effect  for  each  of  the  three  components  in  the  model. 
Separate  large  variance  components  for  the  outcome  component  and  for  a  drop-out  component 
indicate  a  strong  relationship  between  outcome  and  the  respective  form  of  drop-out.  For 
comparison,  we  also  present  results  based  on  the  assumption  that  drop-out  is  missing  at  random 
(MAR).  That  is,  drop-out  is  conditionally  independent  of  the  unobserved  CES-D  outcomes, 
conditioned  on  all  observed  data  (Little,  1995).  In  summary,  we  have  used  these  three  models:  1) 
the  random  effects  logistic  model  without  a  drop-out  component,  under  the  assumption  that  drop¬ 
out  is  missing  at  random  (nai’ve  model);  2)  the  random  effects  logistic  model  augmented  with  a 
discrete  time  survival  logistic  model  for  drop-out,  which  shares  a  random  effect  with  the  ordinal 
CES-D  outcome  (Joint  1  model);  and  3)  model  #2  with  the  last  observed  CES-D  outcome  added  as 
a  covariate  (Joint  2  model). 

The  likelihood  ratio  test  of  treatment  arm  differences  in  change  across  time  (7  degrees  of 
freedom)  was  not  significant  (p=.14).  As  Table  5  suggests,  this  result  was  robust  with  respect  to 
the  drop-out  assumptions  (e.g.,  MAR).  More  specifically,  the  estimates  of  the  log  treatment  odds 
ratio  at  baseline  and  corresponding  treatment-time  interaction  terms  at  each  follow-up  time  differ 
very  little  across  the  three  models.  To  evaluate  the  strength  of  the  relationship  between  outcome 
and  drop-out,  we  present  the  variance  components  of  the  random  effect  shared  by  the  three 
components  (outcome,  two  drop-out  types:  non-protocol-  and  protocol-defined)  two  of  the  models. 
Joint  1  and  Joint,  2  in  Table  6.  Note  that  the  naive  model  only  has  the  outcome  component  and 
therefore  only  one  variance  component.  Table  6  shows  that  neither  of  the  drop-out  components  in 
Joint  1  and  Joint  2  models  is  related  to  the  outcome  through  a  random  effect.  This  lack  of 
relationship  between  outcome  and  drop-out  is  consistent  with  the  fact  that  the  log  odds  ratio 
estimates  in  Table  5  are  very  stable  between  the  naive  and  joint  models.  This  suggests  that  the 
naiVe  random  effects  model  accommodates  the  relationship  between  outcome  and  protocol- 
defined  drop-out.  That  is,  the  MAR  relationship  under  the  naive  model  characterizes  the  type  of 
relationship  between  drop-out  and  outcome  represented  by  the  joint  models.  Of  course,  it  may  be 
that  a  different  relationship  exists  that  is  not  characterized  by  either  the  joint  or  naive  models. 


INSERT  TABLES  5  AND  6  ABOUT  HERE. 

In  summary,  our  study  data  indicate  that  tamoxifen  does  not  influence  depressive 
symptoms  among  women  who  are  at  high  risk  for  breast  cancer,  and  there  is  no  indication  that 
missing  data  masked  an  effect.  It  appeared  that  the  missing  data  did  result  in  slight  underestimates 
of  the  CES-D  scores,  which  were  increased  following  imputation. 

4.  Strategies  for  the  Evaluation  of  Missing  Data:  Hot  Flashes 


Although  tamoxifen  did  not  appear  to  influence  the  CES-D  score  in  this  study,  it  clearly  was 
associated  with  other  symptoms.  Numerous  studies  have  shown  that  tamoxifen  increased  the 
number  and  severity  of  hot  flashes  in  women  being  treated  for  cancer,  and  this  effect  was  also  seen 
in  the  high-risk  women  participating  in  the  BCPT  (Day  et  al,  1999).  Hot  flash  was  the  most 
commonly  reported  symptom  on  either  arm  of  the  BCPT. 

In  Figure  4  (solid  lines),  we  present  the  score  reported  by  these  women  for  hot  flashes  at 
each  cycle  by  treatment  (possible  values  ranged  from  0=none  to  4=extreme).  There  is  a  clear 
increase  in  this  symptom  associated  with  tamoxifen  throughout  the  study.  (Note  that  participants 
taking  placebo  also  report  an  increase  in  mean  hot  flash  score,  although  this  increase  is  not  as  great 
as  for  those  taking  tamoxifen.  This  may  be  another  example  of  the  nocebo  effect.)  Differences  in 
hot  flashes  due  to  treatment  are  highly  significant  (p<.001)  at  every  visit.  However,  when  hot  flash 
scores  immediately  preceding  a  missing  value  were  compared  to  the  scores  immediately  preceding 
an  observed  value  (Table  7),  there  was  a  differential  effect  according  to  treatment.  We  again  did 
an  imputation  in  which  missing  values  were  replaced  by  the  prior  score  adjusted  for  the  mean  for 
the  visit  (as  described  previously  for  the  CES-D  analyses).  There  is  still  clear  evidence  of  a 
tamoxifen  effect  (dashed  lines  in  Figure  4),  but  the  values  for  the  tamoxifen  curve  are  slightly 
lower  than  when  the  missing  values  are  omitted,  while  the  values  for  the  placebo  curve  remain 
nearly  unchanged,  indicating  that  we  might  be  slightly  overestimating  the  treatment  effect  if  we 
ignore  missing  values.  For  example,  the  difference  in  average  scores  is  .30  at  three  years  when 
missing  data  are  ignored  versus  .26  following  the  imputation. 

INSERT  FIGURE  4 


Table  7.  Average  hot  flash  score  prior  to  missing  versus  observed  scores 


Missing  subsequent 
questionnaire 

Observed  subsequent 
questionnaire 

Placebo 

6.87 

0.77 

Tamoxifen 

1.12 

1.16 

An  alternative  analysis  of  these  data  based  on  the  informative  drop-out  model  used  for 
the  CES-D  revealed  a  significant  difference  between  the  treatment  arms  with  respect  to  change  at 
each  follow-up  time  (p<.001).  As  with  the  CES-D  non-significant  treatment  difference,  this 
significant  result  was  robust  with  respect  to  drop-out  assumptions  under  the  random  effects  ordinal 
logistic  model.  The  logistic  model  requires  the  assumption  that  the  relationship  between 
symptoms  and  drop-out  risk  is  in  the  same  direction  in  both  Ae  placebo  and  tamoxifen  groups  and 
over  time.  As  Table  8  indicates,  this  assumption  did  not  hold  for  the  hot  flash  data.  Hence,  we 
were  unable  to  adjust  for  the  observed  drop-out  pattern  to  obtain  valid  estimates  of  the  treatment 
effect. 

INSERT  TABLE  8  ABOUT  HERE 
5.  Conclusions 

Several  points  became  clear  in  the  analysis  of  the  CES-D  data.  Perhaps  the  most  important  is  that 
one  would  be  likely  to  conclude  that  tamoxifen  increased  depressive  symptoms  if  all  the 
participants  had  received  tamoxifen,  as  this  would  appear  to  be  the  most  likely  cause  of  the 
immediate  increase  in  depressive  symptoms.  However,  the  randomization  allowed  us  to  see  that 
the  effect  increase  was  comparable  when  participants  received  placebo,  ruling  out  tamoxifen  as  the 
cause.  The  fact  that  the  prior  scores  associated  with  missing  values  were  elevated  in  both  arms 


indicated  that  the  degree  of  depressive  symptoms  might  have  been  underestimated  slightly  on  both 
arms.  However  the  elevation  was  the  same  in  both  arms,  which  made  it  unlikely  that  there  was  a 
differential  drop-out  effect  by  treatment.  This  partially  explains  why  imputation  analyses  still  led 
to  the  conclusion  that  tamoxifen  did  not  result  in  increased  depressive  symptoms.  Sensitivity 
analyses  indicated  that  even  if  there  were  a  fairly  substantial  treatment  related  difference  in  the 
depressive  symptoms  among  the  drop-outs,  accounting  for  this  differential  effect  would  not 
change  the  conclusion  that  the  depressive  symptoms  were  not  treatment  related. 

The  situation  was  slightly  different  for  the  hot  flash  outcome.  There  was  a  clear 
substantial  effect  of  tamoxifen  on  the  incidence  and  severity  of  hot  flashes.  Furthermore,  there 
was  evidence  of  a  differential  drop-out  effect  by  treatment.  Imputation  indicated  that  this  resulted 
in  a  small  overestimate  of  treatment  effect.  The  rather  unusual  relationship  between  drop-outs  and 
treatment  presented  in  Table  8  would  require  fairly  flexible  models  if  one  were  to  estimate  and 
make  inference  regarding  the  effect.  In  future  methodology  studies,  we  will  address  ways  to 
handle  this  drop-out  pattern. 
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Figure  Legends: 

Fig.  1.  Change  from  baseline  score  for  depression  in  participants  in  the  BCPT.  Depression  is  slightly 
increased  in  the  placebo  group,  compared  to  the  tamoxifen  group  (not  statistically  significant). 


Fig.  2.  The  percent  of  missing  questionnaires  at  each  visit  by  baseline  CES-D  group  (0-10,  11-15,  16- 
21,  and  22-60)  which  is  higher  for  subjects  with  higher  baseline  CES-D  scores. 


Fig.  3.  A.  Increase  in  percent  of  participants  whose  CES-D  score  was  at  least  16,  minus  the  percent 
at  baseline.  The  percent  increased  in  both  arms. 

B.  Effect  of  missing  data.  Increase  in  percent  of  participants  whose  CES-D  score  was  at  least 
16,  after  imputation  with  the  previous  observed  score,  adjusting  for  the  difference  in  treatment  arm 
means  between  the  missed  visit  and  the  preceding  visit.  The  imputed  observations  in  the 
tamoxifen  arm  had  an  additional  3.2%  added,  and  the  resulting  curves  are  nearly  overlapping. 

Fig.  4.  The  mean  hot  flash  score  after  subtraction  of  each  participant's  baseline  score,  by  treatment  arm 
(solid  lines)  and  the  mean  hot  flash  score  after  subtraction  of  each  participant's  baseline  score,  by 
treatment  arm  (dashed  lines).  Tamoxifen  subjects  experienced  more  severe  hot  flashes.  For  each 
subject,  missing  values  were  first  imputed  with  previous  observed  values,  adjusting  for  the  difference 
in  treatment  arm  means  between  the  missed  visit  and  the  preceding  visit.  Imputation  did  not 
substantially  change  the  comparison. 
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*  A  score  of  1 6  or  higher  is  considered  an  indicator  of 
depression. 


Effect  of  missing:  For  CES-D  scores,  estimates  of  baseline  treatment  log 
odds  ratio  and  corresponding  interactions  between  treatment  and  time  for 


MAR  (naive  model);  2)  the  random  effects  logistic  model  augmented  with  a  discrete  time  survival  logistic  model 
for  drop-out  that  shares  a  random  effect  with  the  ordinal  symptom  outcome  (joint  1  model);  3)  model  2)  with  the 
last  observed  symptom  outcome  added  as  a  co variate  (joint  2  model). 


Table  6: 

For  CESD,  estimates  of  variance  components  of  random  intercepts  for 
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Difference  in  Mean  Preceding 
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uld  be  aoDroDriate  if  the  values  all  had  the  same  si 


00 

(N 

o 

o 

<N 

00 

00 

m 

00 

cn 

o^ 

00 

(N 

r-H 

o 

• 

r-H 

• 

p 

p 

o 

(N 

r-H 

o 

o 

i’ 

1* 

• 

1 

• 

1 

• 

o^ 

00 

CN 

to 

m 

(N 

(N 

(N 

o 

p 

1 

^ — 1 

o 

o 

O 

o 

m  VO 


(N  00 

I  t-h 


Figure  1 


9§UBq3  a-saD 


Visit  time  (months) 


Figure  2 


O  'pH  I/) 

VO  ^  O 

o  o  o  ^ 

0 

VO  rH  ^ 
fS  ^  pH  O 


Tl-  m  fS  tH 


§UISSIJ\[  o/„ 


Visit  time  (months) 


Figure  3.A 


O  in  ITi  iTi  o 


91  <  «/„  ui  a§UBq3 


Visit  time  (months) 


Figure  3.B 


91  <  %  w|  a§UBq3 


Visit  time  (months) 


Visit  time  (months) 


Appendix  4 


Kiebert  G,  Wait  S,  Bernhard  J,  Bezjak  A,  Celia  D,  Day  R, 
Houghton  J,  Moinpiour  C,  Scott  C,  Stephens  C.  Practice  and 
policy  of  measuring  quality  of  life  and  health  economics  in 
cancer  clinical  trials:  a  survey  among  cooperative  groups. 
Quality  of  Life  Research  2000;  9(1 0):1 073-80. 


95 


1073 


% 


Qitalitv  of  Life  Research  9:  1073-1080,  2001- 
©  200*^1  Klmver  Academic  Publishers.  Printed  in  the  Netherlands. 


Practice  and  policy  of  measuring  quality  of  life  and  health  economics 
in  cancer  clinical  trials:  A  survey  among  co-operative  trial  groups 

G.  Kiebert‘•^  S.  Wait",  J.  Bernhard\  A.  Bezjak^  D.  Cella^ R.  Day\  J.  Houghton®,  C.  Moinpour®, 

C.  Scott &  R.  Stephens" 

'MEDTAP  International.  London,  UK;  -Quality  of  life  Taskforce  in  Oncology.  Novartis  Pharma,  Basel. 
Switzerland;  ^  Swiss  Group  for  Clinical  Cancer  Research,  and  International  Breast  Cancer  Study  Group.  Bern, 
Switzerland:  ‘^National  Cancer  Institute  of  Canada.  Clinical  Trials  Group,  Toronto,  Ontario.  Canada;  ^Eastern 
Cooperative  Oncology  Group,  Evanston,  IL,  USA;  ^Gynecologic  Oncology  Group,  Evanston,  IL,  USA; 
'' National  Surgical  Adjuvant  Breast  and  Bowel  Project,  Pittsburgh,  PA,  USA;  ^Cancer  Research  Council  and 
UCL  Cancer  Trials  Centre,  London,  UK;  '^Southwest  Oncology  Group.  Seattle.  WA .  USA ;  '^Radiation  Therapy 
Oncology  Group,  Philadelphia,  PA,  USA;  Medical  Research  Council,  Clinical  Trial  Unit,  London,  UK 

Accepted  in  revised  form  2  February  2001 


Abstract 

Background:  Co-operative  groups  have  played  an  important  role  in  the  advance  of  health-related  quality  of 
life  (HRQL)  research-  However,  definitions  of  the  concept,  criteria  for  selection  of  existing  instruments  and 
methods  for  data  collection  and  interpretation  remain  poorly  defined  in  the  literature.  A  survey  was 
conducted  amongst  the  major  cancer  co-operative  groups  in  order  to  gain  a  better  understanding  of  their 
current  policy  and  processes  to  ensure  optimal  HRQL  data  collection  within  cancer  clinical  trials.  The  topic 
of  health  economics  was  similarly  addressed.  Methods:  A  written  questionnaire  was  addressed  to  16  major 
European  and  North  American  cancer  co-operative  groups.  Eleven  groups  responded  (response  rate:  69%), 
however,  one  group  could  not  provide  information  for  the  survey,  thus  ten  questionnaires  were  available 
for  analysis.  Results:  The  results  from  this  survey  among  co-operative  groups  show  that  HRQL  (more  than 
health  economics)  is  recognized  as  an  important,  although  usually  secondary,  outcome  measure  in 
oncology  trials.  On  the  whole,  co-operative  groups  have  a  rather  flexible  policy  towards  the  inclusion  ot 
HRQL  (and  HE)  into  their  clinical  trials,  and  practice  is  very  much  on  a  case-by-case  basis,  but  use 
standard  practice  guidelines  and  internal  procedures  is  to  ensure  well-defined  study  protocols  and  enhance 
good  quality  studies. 

Key  words:  Cancer,  Co-operative  group.  Health  economics.  Randomized  controlled  clinical  trials,  Quality 
of  life 


Introduction 

In  chronic  diseases  where  cure  is  often  not 
achievable,  it  has  long  been  recognized  that  im¬ 
provement  in  health-related  quality  of  life 
(HRQL)  is  of  great  importance.  Oncology  was  one 
of  the  first  disease  areas  where  the  importance  of 


This  study  was  supported  from  an  unrestricted  financial  grant 
from  Novartis  Pharma  AG. 


HRQL  as  an  outcome  measure  was  acknowledged; 
in  US,  HRQL  outcomes  were  first  included  in 
large  treatment  and  prevention  trials  in  cardio¬ 
vascular  disease.  Over  the  past  ten  years,  there  has 
been  an  increasing  emphasis  on  the  role  of  alter¬ 
nate  outcomes  other  than  the  classical  clinical  trial 
endpoints  of  response  rate,  disease-free  or  overall 
survival.  Since  most  trials  take  many  years  to 
mature,  it  is  only  now  that  gradually  more  and 
more  publications  of  clinical  trials  include  HRQL. 
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Co-operative  groups  are  playing  an  increasingly 
important  role  in  the  advancement  of  cancer  care 
through  the  conduct  of  clinical  trials,  and  the 
establishment  of  treatment  recommendations  and 
guidelines.  Collaborative  trials  groups  have  also 
been  active  proponents  of  quality  of  life  research. 
For  instance,  the  proceedings  of  a  workshop  fo¬ 
cusing  on  practical  and  methodological  issues  re¬ 
lated  to  missing  quality  of  life  data  in  clinical  trials 
in  which  all  major  co-operative  trial  groups  par¬ 
ticipated  and  contributed  were  recently  published 
as  a  special  issue  in  Statistics  in  Medicine  [1]. 

An  informal  review  of  existing  literature  indi¬ 
cated  that  many  of  the  large  oncology  co-operative 
groups  have  some  kind  of  policy  or  guidelines  for 
the  inclusion  of  HRQL  as  an  endpoint  in  cancer 
clinical  trials.  However,  the  overall  information 
from  existing  publications  is  scarce,  incomplete 
and  not  up-to-date.  In  particular,  information  on 
criteria  for  selection  of  existing  instruments, 
methods  for  assessment,  and  data  collection  pro¬ 
cedures  and  instructions  is  lacking.  For  this  rea¬ 
son,  a  survey  was  done  of  the  major  co-operative 
groups  (i)  that  conduct  clinical  studies  in  more 
than  one  type  of  cancer  or  (ii)  that  focus  on  a 
single  type  of  cancer  but  whose  scope  and  mem¬ 
bership  are  pan-continental. 

The  objective  of  this  survey  was  to  obtain  up-to- 
date  information  on  the  co-operative  group  policy 
on  HRQL  research.  Since  health  economics  (HE), 
specifically  resource  utilization  data  collection,  is 
gradually  being  evaluated  in  cancer  clinical  trials, 
our  survey  addressed  this  as  an  additional  topic. 

The  survey  was  developed  and  conducted  within 
the  context  of  a  special  multidisciplinary  taskforce, 
whose  mandate  was  to  develop  internal  guidelines 
on  HRQL  evaluation  within  oncology  clinical  tri¬ 
als  at  a  large  pharmaceutical  company.  Recog¬ 
nizing  the  prominent  role  that  co-operative  groups 
have  played  in  HRQL  research  in  oncology,  the 
taskforce  felt  that  it  was  essential  to  look  to  these 
groups  for  "state  of  the  art’  processes  and  strategies 
to  ensure  optimal  HRQL  data  collection  within 
clinical  trials. 


Methods 

The  target  group  consisted  of  all  major  national  or 
international  co-operative  groups  that  conduct 


studies  in  more  than  one  type  of  cancer  and  multi¬ 
continental  groups  focusing  on  one  type  of  cancer. 
The  first  step  involved  the  identification  of  the  key 
person  in  each  co-operative  group  responsible  for 
quality  of  life  issues  who  could  respond  to  the 
questionnaire  on  behalf  of  the  co-operative  group. 
This  step  was  performed  by  telephone  survey  by 
the  principal  study  investigator  (GK).  For  all 
groups  this  key  person  is  a  specialized  quality  of 
life  researcher.  Once  the  key  person  was  identified, 
this  person  was  sent  a  cover  letter  stating  the  ob¬ 
jective  and  content  of  the  survey,  an  invitation  to 
participate,  and  a  request  to  return  the  completed 
questionnaire  within  six  weeks.  A  written  reminder 
was  sent  to  all  non-responders  after  six  weeks. 
Three  weeks  thereafter,  the  remaining  non-re¬ 
sponders  were  contacted  by  telephone  and,  in  one 
case,  by  fax. 

The  final  response  rate  was  11  out  of  16.  Three 
groups  did  not  respond,  two  groups  refused  (one 
because  of  time  constraints  (Cancer  and  Leukemia 
Group  B  (CALGB))  and  one  because  of  concerns 
about  confidentiality  of  information  (European 
Organisation  for  the  Research  and  Treatment  of 
Cancer  (EORTC)).  One  group  was  willing  to 
participate,  but  at  the  time  of  the  survey  this  in¬ 
formation  was  not  readily  available  for  organiza¬ 
tional  reasons.  Thus,  a  total  of  ten  questionnaires 
were  available  for  analysis.  Table  1  provides  an 
overview  of  the  groups  that  were  approached  and 
their  responses  to  our  invitation  to  participate  in 
the  survey. 

The  questionnaire  was  developed  especially  for 
this  survey.  A  listing  was  made  of  all  relevant 
topics  for  which  we  intended  to  collect  data.  In  a 
second  step  a  set  of  questions  were  formulated 
addressing  all  different  aspects  of  each  topic.  A 
draft  version  of  the  survey  was  reviewed  by 
members  of  the  taskforce  experienced  in  the 
development  of  questionnaires. 

The  questionnaire  addressed  the  following  top¬ 
ics:  overview  of  ongoing  clinical  trials  with  and 
without  HRQL  in  the  most  prevalent  types  of 
cancer;  co-operative  group  trial  selection  policy; 
procedures  and  methods  for  inclusion  of  HRQL 
into  clinical  trials;  study  center  training  and 
guidelines  for  HRQL  data  collection;  data  analysis 
and  reporting  of  findings.  The  same  questions  were 
asked  for  HE.  The  results  of  the  survey  are  dis¬ 
cussed  Below  in  this  order  of  topics. 
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Results 

Overview  of  ongoing  clinical  trials 

Most  numerous  of  on-going  clinical  trials  are 
those  in  gynecological,  breast,  lung,  prostate  and 
colorectal  cancers  (Table  2).  In  more  than  half  of 
these  trials,  HRQL  is  evaluated,  although  usually 
as  a  secondary  endpoint,  and  only  seldom  as  the 
primary  endpoint.  Notable  exceptions  are  trials 


that  evaluate  best  supportive  care,  where  HRQL  is 
the  primary  endpoint  in  six  out  of  eight  trials.  HE 
endpoints  are  much  less  frequently  collected  in  the 
reported  trials. 

Trial  selection  policy 

Limited  research  resources  and  budget  constraints 
often  necessitate  prioritising  of  HRQL  studies.  In 
the  context  of  clinical  trials  this  situation  is  not 


Table  1.  Overview  of  target  groups  and  survey  response 


Co-operative  group  Response 


Cancer  Research  Council,  UCL  Cancer  Trials  Centre  (UK)  Yes 

Medical  Research  Council,  Clinical  Trials  Unit  (UK)  Yes 

Swiss  Group  for  Clinical  Cancer  Research  (SIAKK/SAKK)  (Switzerland)  Yes 
International  Breast  Cancer  Study  Group  (Switzerland)  Yes 

National  Cancer  Institute  of  Canada,  Clinical  Trials  Group  (Canada)  Yes 

Eastern  Cooperative  Oncology  Group  (US)  Yes 

Gynecologic  Oncology  Group  (US)  Yes 

Southwest  Oncology  Group  (US)  Yes 

National  Surgical  Adjuvant  Breast  and  Bowel  Project  (US)  Yes 

Radiation  Therapy  Oncology  Group  (US)  Yes 

Deutsche  Krebsgesellschaft  (as  representative  of  the  German  Willing,  but 

Co-operative  Groups  (Germany))  information  not 

readily  available 

Interdisciplinary  Group  for  Cancer  Care  Evaluation  (Italy)  No  response 

Federation  Nationale  des  Centres  de  Lutte  Centre  le  Cancer  (France)  No  response  . 

Nordic  Cancer  Trial  Group  (Scandinavia)  No  response 

Cancer  and  Leukemia  Group  B  (US)  Refused 


European  Organisation  for  the  Research  and  Treatment  of  Cancer  (Europe)  Refused 


Table  2.  Summary  of  ongoing  clinical  trials  by  disease  site  in  1998  for  all  ten  groups  surveyed 


Cancer  site* 

Total  number  of 
ongoing  trials  in  1998 
(n) 

Number  of  trials 
with  HRQL  as 
primary  endpoint 
n  (%) 

Number  of  trials 
with  HRQL  as 
secondary  endpoint 

n  (%) 

Number  of  trials 
including  HE 
n  (%) 

Brain 

11 

0 

5(45) 

0 

Breast 

30 

3(10) 

14  (47) 

1  (3) 

Colorectal 

20 

1  (5) 

9(45) 

3(15) 

Prostate 

19 

3  (16) 

10  (53) 

0 

Gynecology 

32 

1  (3) 

16  (50) 

5(16) 

Head  &  Neck 

16 

0 

6(38) 

0 

Leukemia 

10 

0 

0 

0 

Lung 

27 

0 

13(48) 

2(7) 

Lymphoma 

11 

1  (9) 

I  (9) 

0 

Melanoma 

3 

0 

1  (33) 

0 

Multiple  cancer  sites 

Supportive  care 

9 

6  (67) 

2  (22) 

0 

Palliative  care 

9 

2(22) 

6  (67) 

2(22) 

*  Selection  of  type  of  cancer  based  on  prevalence  of  the  disease.  It  does  not  represent  a  complete  overview  of  ail  ongoing  clinical  trials 
per  group. 


different.  The  trials  where  HRQL  is  considered 
most  important  are  those  in  which  a  large  survival 
advantage  is  not  expected;  which  compare  very 
different  treatments  (e.g.,  chemotherapy  vs.  radi¬ 
ation)  that  will  likely  result  in  different  side  effect 
profiles,  and  those  in  which  patients  are  symp¬ 
tomatic  and  the  treatment  is  expected  to  relieve 
those  symptoms. 

In  nine  out  of  ten  groups,  there  is  a  specific 
person  or  committee  in  the  co-operative  group 
responsible  for  HRQL  research  issues  such  as  trial 
selection,  procedures  for  data  collection,  imple¬ 
mentation.  and  methodology.  Only  one  of  the  co¬ 
operative  groups  has  adopted  a  policy  of  including 
HRQL  in  all  cancer  clinical  trials  as  a  standard 
(National  Cancer  Institute  of  Canada  (NCIC)).  In 
all  other  groups,  this  decision  depends  on  a  num¬ 
ber  of  factors  such  as  study  design,  research 
question,  sample  size,  number  of  participating 
centres  and  countries,  and  a  number  of  population 
characteristics.  A  randomized  study  design  allows 
for  comparison  of  HRQL  between  the  two  study 
arms  and  distinguishes  the  effect  of  trial  interven¬ 
tion  over  time.  The  research  question  determines 
the  relevance  of  HRQL  as  an  endpoint  to  that 
question  and  the  sample  size  distinguishes  whether 
there  will  be  a  sufficient  number  of  patients  to 
provide  an  answer  to  the  HRQL  research  ques¬ 
tion.  The  number  of  participating  centres  and 
countries  influence  the  feasibility  of  HRQL  as¬ 
sessment  and  likelihood  of  compliance  to  ques¬ 
tionnaire  completion,  the  number  of  languages  in 
which  the  questionnaire  will  need  to  be  available, 
as  well  as  funding  needed.  Duration  of  the  trial 
affects  feasibility  as  well  as  funding  issues.  Finan¬ 
cial  constraints  can  play  limitating  role  and  ne¬ 
cessitate  prioritising  of  trials  that  include  HRQL 
as  an  outcome  parameter.  Age  of  the  patients  is 
most  relevant  in  the  paediatric  population  to  de¬ 
termine  whether  self-assessment  of  HRQL  is  pos¬ 
sible.  And  lastly,  the  health  care  setting  frequently 
influences  the  availability  of  personnel  to  admin¬ 
ister  HRQL  questionnaires. 

Table  3  provides  the  detailed  ratings  of  impor¬ 
tance  of  different  factors  in  the  selection  of  trials 
for  inclusion  of  HRQL.  Numbers  represent  the 
sum  of  responses  from  the  10  groups  surveyed. 
Globally,  treatment  characteristics  appear  to  play 
a  more  important  role  in  the  selection  of  trials  for 
HRQL  data  evaluation  than  trial  and  population 


chai  acteristics.  Study  design,  available  resources, 
toxicity  of  treatment  and  absence  of  incremental 
survival  advantage  were  the  most  important  fac¬ 
tors. 

There  is  often  discussion  as  to  whether  HRQL 
is  best  collected  within  the  actual  clinical  trial  or  as 
a  separate  or  companion  protocol.  When  asked 
whether  HRQL  studies  were  conducted  as  an 
integral  part  of  the  study  protocol,  seven  groups 
responded  >es,  always’,  and  three  reported 
'sometimes’.  Six  respondents  stated  that  HRQL 
was  never  conducted  with  a  separate  protocol,  and 
four  respondents  stated  that  this  was  sometimes 
the  case.  To  the  question  whether,  when  included 
in  a  trial,  HRQL  was  a  mandatory  aspect  of  the 
study  for  all  participating  centers,  five  groups 
responded  'yes,  always’,  one  'no,  never’  and  four 
'sometimes'. 

Mode  of  assessment  and  choice  of  instrument 

All  but  one  group  use  written  questionnaires  as  a 
standard  mode  of  HRQL  assessment,  and  five 
groups  use  in  principle  the  same  instrument  in  all 
studies  (either  EORTC  QLQ-C30;  FACT-G;  or 
LAS  A  scale).  For  the  other  groups,  the  choice  of 
the  instrument  depends  mainly  on  the  trial  char¬ 
acteristics,  psychometric  properties  and  its  practi¬ 
cality  for  a  particular  trial,  and  to  a  lesser  degree 
on  language  availability,  familiarity  with  the  in¬ 
strument  or  its  theoretical  foundation.  Examples 
of  questionnaires  that  have  been  used  previously  in 
trials  are  SWOG  QoL  questionnaire,  CARES-SF, 
MOS-SF36,  EORTC  QLQ-C30;  FACT-G;  or 
LASA  scale. 

HRQL  research  guidelines 

All  groups  provide  some  form  of  specific  instruc¬ 
tions  to  the  participating  centers  for  the  collection 
of  HRQL  data.  These  can  consist  of  written 
guidelines,  training  days,  a  HRQL  training  video, 
procedure  manuals  for  HRQL  assessment,  regular 
internal  training  at  group  meetings,  and  an  initi¬ 
ation  site  visit  to  discuss  the  HRQL  aspects  of  the 
protocol. 

Six  out  of  ten  groups  have  written  internal 
procedures  or  guidelines  for  HRQL  data  analysis 
and  interpretation.  Topics  covered  by. all  guide¬ 
lines  include  the  plan  for  statistical  data  analysis 
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Table  3.  Average  importance  of  factors  influencing  decisions  to  include  HRQL  as  an  endpoint  in  a 
clinical  trial 


* 

Importance  rating 

Not  at  all 

A  bit 

Quite  a  lot 

Very  much 

Trial  characteristics 

Resources  available 

0 

T 

4 

4 

Study  design 

2 

2 

2 

4 

Monitoring  capacity 

2 

2 

5 

1 

Representativeness  of  participating 

3 

4 

1 

“ 

investigators  and  centers 

Sample  size 

1 

6 

3 

u 

Participating  countries 

1 

8 

p 

0 

Number  of  participating  countries 

2 

7 

0 

0 

Number  of  centers 

4 

6 

0 

0 

Duration  of  trial 

5 

4 

1 

0 

Treatment  of  characteristics 

Equal  efficacy  in  terms  of 

0 

0 

2 

/ 

•  survival  expected 

Toxicity  of  treatment 

0 

0 

e 

3 

New  treatment  modality 

0 

1 

6 

1 

1 

New  mode  of  administration 

0 

5 

3 

Palliative  intent 

2 

1 

2 

3 

Curative  intent 

2 

4 

2 

0 

Population  characteristics 

Age  (children,  adults,  elderly) 

2 

6 

1 

1 

0 

Representativeness  of  trial  population 

3 

4 

3 

Health  care  setting  (in-  vs.  outpatient 
department  or  home  care) 

2 

8 

0 

0 

Instrument  characteristics 

Availability  of  suitable  instrument 

0 

6 

1 

3 

Other 

Burden  on  patients 

Statement  that  HRQL  outcome  is 
critical  for  interpreting  results 

Potential  outside  funding 

Note:  Answers  shown  above  represent  the  sum  of  respondents  choosing  that  category. 


and  calculation  of  sample  size  estimations.  Han¬ 
dling  of  missing  data  is  included  in  five  out  of  six. 
Other  topics  mentioned  were  the  interpretation  of 
results  as  clinically  meaningful  changes  over  time 
(n  =  1),  in  relation  to  clinical  data  (n  =  3)  or  to 
other  outcome  measures  (n  ==  1).  Only  one  group 
addresses  the  issue  of  the  pooling  of  data  for 
multinational  analysis,  which  is  not  surprising  as 
the  majority  of  respondents  are  groups  that  oper¬ 
ate  mainly  at  a  national  level. 

Topics  that  are  not  addressed  at  all  in  existing 
guidelines  are  the  dissemination  of  results  within 
clinical  practice  and  the  role  of  HRQL  outcomes 
in  subsequent  treatment  decision  making. 


Interest  in  HRQL  research 

Four  groups  stated  that  their  interest  in  HRQL 
research  is  very  high,  and  five  groups  expressed 
quite  some  interest  (missing  n  ^  1). 

Health  economics 

In  general,  the  activity  and  interest  in  health  eco¬ 
nomics  is  significantly  less  among  all  groups  than 
for  HRQL.  In  three  groups,  health  economics  data 
in  the  form  of  resource  use  such  as  hospitalization, 
medication,  diagnostic  tests  used,  number  of  out¬ 
patient  visits,  have  never  been  assessed  in  any  trial. 


Four  groups  have  a  person  or  committee  specifi¬ 
cally  responsible  for  health  economic  issues;  one 
group  has  a  broad  outcomes  committee  that  can 
address  health  outcomes  including  HE.  None  of 
the  groups  has  a  standard  policy  to  collect  HE 
data  in  each  trial. 

Four  groups  identified  formal  criteria  that  they 
followed  when  deciding  whether  to  include  HE  as 
an  outcome  measure.  The  most  important  con¬ 
siderations  were  the  direct  cost  of  the  investigated 
treatment(s),  costs  associated  with  treatment  of 
adverse  events,  and  potential  financial  conse¬ 
quences  of  treatment  for  the  hospital,  practice,  or 
patient.  Trial  population  characteristics  and 
external  requirements  from  health  authorities  and/ 
or  medical  ethics  committees  play  a  less  important 
role  in  HE  inclusion  decisions. 

Three  groups  have  some  sort  of  guidelines  for 
the  collection  of  HE  data.  None  of  the  groups  has 
internal  procedures  or  guidelines  for  the  analysis 
and  interpretation  of  HE  data. 

Interest  in  HE  research 

The  perceived  level  if  interest  in  HE  is  fairly  low; 
one  group  is  very  interested,  two  groups  are  quite 
interested  and  'five  groups  indicated  a  bit  of 
interest  in  the  subject  (missing:  n  =  1). 

Discussion 

The  objective  of  this  study  was  to  obtain  up-to- 
date  information  of  the  processes  and  strategies 
used  by  large  national  and  international  oncology 
co-operative  groups  to  conduct  HRQL  research 
and  to.  ensure  optimal  HRQL  data  collection 
within  their  clinical  trials.  Questions  were  also 
asked  with  regards  to  the  groups’  policy  towards 
HE  data  collection,  as  it  is  felt  that  this  is  an 
emerging,  and  complementary,  field  of  research  to 
that  of  HRQL  [2]. 

One  of  the  important  limitations  of  our  study  is 
the  size  and  representativeness  of  the  study  sam¬ 
ple.  We  approached  only  (i)  large  national  or 
international  co-operative  groups  that  conduct 
studies  on  more  than  one  type  of  cancer  and  (ii) 
multi-continental  groups  focusing  on  one  type  of 
cancer.  Moreover,  we  did  not  include  groups  ac¬ 
tive  in  the  field  of  pediatric  oncology.  As  a  result. 


there  are  clear  limitations  regarding  the  represen¬ 
tativeness  of  our  sample  and  the  generalizability  of 
the  results.  The  majority  of  the  participating  co¬ 
operative  groups  is  North  American,  leaving  other 
continents,  and  especially  Europe,  clearly  under- 
represented.  Non-participation  in  our  survey  does 
not  imply  lack  of  experience  or  policies  regarding 
HRQL  and  HE  research.  For  instance,  the  EO- 
RTC  has  been  active  in  the  field  of  HRQL  research 
since  many  years,  and  has  published  on  their 
strategy  to  include  HRQL  as  an  endpoint  in  their 
clinical  trials  [3].  It  would  be  inappropriate  to  infer 
their  policy  from  publicly  available  information  as 
these  will  not  provide  the  same  level  of  detail  ob¬ 
tained  by  our  survey.  The  same  approach  would 
also  have  to  be  applied  to  other  co-operative 
groups,  and  published  reports  from  other  multi¬ 
national  or  national  European  groups  on  HRQL 
and  HE  policies  and  strategies  are  scarcer. 

One  may  ask  the  question  whether  Europe  is 
different  from  North  America  in  its  approach  to 
HRQL  research.  One  source  of  information  is  to 
look  at  the  stance  of  health  authorities  to  HRQL 
in  these  two  continents.  In  US,  a  1996  publication 
[4]  on  the  position  of  the  Federal  Drug  Adminis¬ 
tration  (FDA)  with  regards  to  HRQL  suggests 
that,  for  the  FDA,  HRQL  is  more  important  than 
traditional  measures  of  efficacy  such  as  tumor 
response  for  drugs  that  do  not  have  any  impact 
on  survival.  More  recently,  the  FDA  has  set  up  a 
special  committee  in  collaboration  with  outside 
researchers  to  investigate  further  the  role  of 
HRQL  within  the  registration  and  labehng  of 
oncology  products  (i.e.  Subcommittee  of  the  On¬ 
cology  Drug  Advisory  Committee).  In  Europe,  the 
European  Medicines  Evaluation  Agency  (EMEA) 
cite  “symptom  control  backed  up  by  quality  of  life 
assessments”  as  one  of  the  possible  secondary 
outcome  measures  in  their  1996  publication  of  the 
Committee  for  Proprietary  Medicinal  Products 
(CPMP)  [5].  However,  the  actual  role  that  HRQL 
data  have  played  in  drug  approval  decisions  by 
both  of  these  agencies  remains  to  be  elucidated  [6]. 
One  positive  example  in  the  US  is  the  role  played 
by  HRQL  data,  specifically  reduction  of  pain,  in 
the  approval  of  mitoxantrone  and  prednisone  for 
the  treatment  of  hormone-refractory  prostate 
cancer  [7].  Indeed,  it  may  be  assumed  that 
authorities  in  both  Europe  and  North  America  are 
at  the  early  stages  of  learning  about  the  value  of 
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HRQL  research  and  findings  to  the  development 
and  acceptance  of  new  cancer  therapies.  Within 
this  learning  environment,  co-operative  groups  in 
all  continents  may  play  an  important  role  in  set¬ 
ting  precedents,  disseminating  research  findings 
and  advancing  methodologies  in  this  growing 
field. 

In  our  survey,  we  did  not  ask  respondents  to 
differentiate  between  trials  that  are  financially 
supported  publicly  or  by  the  pharmaceutical 
industry.  Clinical  trials  in  US  are  predominantly 
sponsored  by  the  government,  whereas  co-opera¬ 
tive  groups  in  Europe  and  Canada  have  more  of  a 
mixture  of  government  and  industry  sponsored 
studies.  For  industry  sponsored  trials,  the  most 
influential  factor  on  whether  to  include  HRQL  as 
an  endpoint  is  the  requirement  of  this  type  of  data 
by  the  regulatory  authorities.  From  the  perspective 
of  the  co-operative  group,  the  issue  of  available 
funding  is  of  great  importance  and  can  to  a  certain 
extent  influence  the  support  for  HRQL  assess¬ 
ments.  Industry  reimbursement  rates  per  patient 
participating  in  a  trial  are  usually  greater  than 
funding  rates  from  public  sources  and  the  added 
resources  can  be  used  to  pursue  non-traditional 
endpoints  or  to  provide  financial  support  for 
studies  involving  non-pharmaceutical  therapeutic 
modalities.  It  would  be  very  interesting  to  conduct 
a  similar  survey  among  pharmaceutical  companies 
and  to  be  able  to  compare  the  pharmaceutical 
policies  regarding  the  inclusion  of  HRQL  and 
health  economic  research  questions  in  clinical  tri¬ 
als  to  those  of  co-operative  groups. 

The  results  from  this  survey  among  co-operative 
groups  show  that  HRQL  is  recognized  as  an 
important,  although  usually  secondary,  outcome 
measure  in  oncology  trials.  Although  health  eco¬ 
nomics  data  such  as  hospitalizations  or  other  re¬ 
source  use  play  a  much  lesser  role  in  the  clinical 
trial  context,  their  role  in  reimbursement  decisions 
may  be  more  prominent.  On  the  whole,  co-oper¬ 
ative  groups  have  a  rather  flexible  policy  towards 
the  inclusion  of  HRQL  (and  HE)  into  their  clinical 
trials,  and  practice  is  very  much  on  a  case-by-case 
basis.  The  fact  that  many  groups  have  developed 
written  internal  procedures  or  guidelines  does  not 
mean  that  they  adopt  a  rigid  approach  towards 
design,  analysis  or  interpretation  of  results.  The 
purpose  of  guidelines  and  internal  procedures  is  to 
ensure  well-defined  study  protocols  and  enhance 
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good  quality  studies.  This  is  underlined  by  the  fact 
that  all  groups  recognized  the  importance  of 
training  of  clinical  trial  managers  for  HRQL  data 
collection,  an  aspect  often  neglected  in  industry- 
run  HRQL  studies.  The  fact  that  HRQL  evalua¬ 
tion  was  most  often  recognized  as  an  integral,  and 
often  mandatory,  part  of  clinical  trials  is  a  prom¬ 
ising  sign,  as  acceptance  and  understanding  of  this 
outcome  by  treating  physicians  will  only  grow  with 
their  increased  exposure  to  its  analysis,  within  the 
context  of  other  clinical  findings. 

One  aspect  that  was  not  addressed  by  all  groups 
was  the  dissemination  and  positioning  of  HRQL 
findings  within  the  context  of  clinical  trial  evidence  • 
and  the  implications  of  these  findings  for  clinical 
practice.  The  need  for  further  research  and  guid¬ 
ance  in  this  area  was  also  highlighted  in  several 
surveys  of  practicing  oncologists  on  their  percep¬ 
tion  of  HRQL  [8,  9].  Clearly,  an  essential  aspect  to 
the  development  of  HRQL  research  remains  the 
proper  interpretation  of  findings,  clear  communi¬ 
cation  of  the  results  to  practicing  physicians  and 
patients,  and,  ultimately,  the  integration  of  HRQL 
aspects  of  therapy  into  actual  treatment  decisions. 

In  conclusion,  the  results  of  this  survey  confirm 
the  impression  that  HRQL  research  is  a  growing, 
however  still  developing  field  'in  the  context  of 
clinical  trials.  Co-operative  groups  are  likely  to 
continue  to  play  an  increasing  role  in  the  ad¬ 
vancement  of  this  science  and  the  dissemination  of 
findings  to  treating  physicians  and  their  patients. 
Their  role  in  the  promotion  of  health  economics 
research  may  be  a  lesser  one.  One  may  hope  that 
the  knowledge  and  experience  that  these  trials 
groups  acquire  in  including  HRQL  parameters 
into  their  trials  may  serve  other  researchers  and 
drug  sponsors  in  achieving  a  more  comprehensive 
assessment  of  the  impact  of  new  therapies  on 
cancer  patients. 
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